google / brotli

Brotli compression format
MIT License
13.45k stars 1.23k forks source link

Incompilable with Intel, few tweaks and options miissing #174

Closed Sanmayce closed 8 years ago

Sanmayce commented 8 years ago

Hi Brotli team, despite me being one of Google "haters" let me share my 2 cents on current Brotli status.

In next several months I intend to juxtapose several high-performance textual compressors with one goal in mind - showing most balanced ones for high-ratio/high-decompression-speed scenario.

Yesterday I downloaded your 'master' zip and compiled (with several syntactic changes) with Intel v15.0 optimizer.

In my incoming showdown I want to include Brotli wanting to see how it performs in its best environment, I speak textual (mostly English texts) files.

[Question #1:] Since my goal is to show tightness&decompression-speed top-performers, are following enforced defaults best?

struct BrotliParams { BrotliParams() // : mode(MODE_GENERIC), // quality(11), // lgwin(22), // lgblock(0), // enable_dictionary(true), // enable_transforms(false), // greedy_block_split(false), // enable_context_modeling(true) {}

  : mode(MODE_TEXT),
    quality(11),
    lgwin(24),
    lgblock(24),
    enable_dictionary(true),
    enable_transforms(false),
    greedy_block_split(false),
    enable_context_modeling(true) {}

It would be very good to make these command line toggleable, no?

[Question #2:] Your little announcement makes the impression Brotli is something special on text, what do I miss to see that? My quick test shows goodness but not greatness? The below stats are for your yesterday commit compiled with Intel v15.0 (/O3 used), Brotli outperforms Shifune, but in decompression-speed department 3x is no joke, don't tell me if I use a browser or some English texts full-text browser/searcher Brotli will load 'dickens' faster than Zstd or even Shifune.

D:>bro_Intel15.exe -i dickens -o dickens.brotli -v Brotli compression speed: 0.200944 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f Brotli decompression speed: 142.945 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f Brotli decompression speed: 138.861 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f Brotli decompression speed: 145.079 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 5 Brotli decompression speed: 144.647 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 5 Brotli decompression speed: 145.513 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 20 Brotli decompression speed: 145.841 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 40 Brotli decompression speed: 144.701 MB/s

D:>Nakamichi_Shifune_branchfull.exe dickens.Nakamichi /report Nakamichi 'Shifune-Totenschiff', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced. Note: This compile can handle files up to 1711MB. Decompressing 3740418 bytes ... RAM-to-RAM performance: 512 MB/s. Compression Ratio (bigger-the-better): 2.72:1

D:>dir dic*

09/25/2015 03:32 AM 10,192,446 dickens 09/25/2015 03:29 AM 2,962,118 dickens.brotli 09/08/2015 02:33 AM 3,740,418 dickens.Nakamichi

D:>

The above quick run was done on my Core 2 laptop, on Haswell the 3x may jump up to 5x hands down, hate that I don't have Haswell or alike to share the actual stats.

[Question #3:] Don't you think that your defaults (encode.h) are too low, I do, my big test shows worse ratio than gzip?

D:>zpaq64 add _Deathship_textual_corpus.tar.method58.zpaq _Deathship_textual_corpus.tar -method 58 -threads 1 D:>bsc e _Deathship_textual_corpus.tar _Deathship_textual_corpus.tar.ST6Block256.bsc -b256 -m6 -cp -Tt D:>xz -z -k -f -9 -e -v -v --threads=1 _Deathship_textual_corpus.tar D:>lzturbo.exe -39 -b256 -p0 _Deathship_textual_corpus.tar _Deathship_textual_corpus.tar.256MB.lzturbo12-39.lzt D:>zpaq64 add _Deathship_textual_corpus.tar.method28.zpaq _Deathship_textual_corpus.tar -method 28 -threads 1 D:>7za a -tgzip -mx9 _Deathship_textual_corpus.tar.zip _Deathship_textual_corpus.tar D:>bro_Intel15.exe -i _Deathship_textual_corpus.tar -o _Deathship_textual_corpus.tar.brotli -v D:>zstd.exe _Deathship_textual_corpus.tar D:>LZ4.exe -9 _Deathship_textual_corpus.tar

09/12/2015 12:59 PM 1,125,281,882 _Deathship_textual_corpus.tar.method58.zpaq 09/12/2015 02:34 AM 1,342,098,184 _Deathship_textual_corpus.tar.ST6Block256.bsc 09/11/2015 11:56 AM 1,471,795,768 _Deathship_textual_corpus.tar.xz 09/13/2015 07:31 PM 1,484,820,599 _Deathship_textual_corpus.tar.256MB.lzturbo12-39.lzt 09/14/2015 09:18 AM 1,800,083,824 _Deathship_textual_corpus.tar.method28.zpaq Here comes Nakamichi 'Shifune' ... 09/13/2015 06:29 AM 2,181,159,237 _Deathship_textual_corpus.tar.zip 09/24/2015 11:36 PM 2,382,646,308 _Deathship_textual_corpus.tar.brotli 09/13/2015 03:04 AM 2,491,454,533 _Deathship_textual_corpus.tar.zst 09/13/2015 07:50 AM 2,626,828,543 _Deathship_textual_corpus.tar.lz4 09/11/2015 06:41 AM 8,090,119,168 _Deathship_textual_corpus.tar

A glimpse at my unfinished latest benchmark: www.sanmayce.com/Hayabusa/Deathship_showdown.pdf www.sanmayce.com/Hayabusa/Nakamichi_Shifune.pdf

[Suggestion #1:] Your time reports seem problematic, I receive 0 MB/s for big files. Please make Brotli with '-b' benchmark or '-t' test (decompression without dump) ability, Zstd&Z4 have very good report. Your current speed report includes 'fwrite()' time, I want Brotli's pure RAM-2-RAM performance.

[Suggestion #2:] Make it compileable with Intel C/C++ optimizer, this will be appreciated by me for one. Current changes in bro.cc (I made) to run it:

1:

    //#include <unistd.h>
    #include <time.h>
    #include <fcntl.h>
    #include <io.h>

2:

static FILE* OpenInputFile(const char* input_path) {
//  if (input_path == 0) {
//    return fdopen(STDIN_FILENO, "rb");
//  }
/*
tools\bro.cc(136): error: identifier "STDIN_FILENO" is undefined
      return fdopen(STDIN_FILENO, "rb");
                    ^
*/
  if (input_path == 0) {
    setmode(_fileno( stdin ), O_BINARY);
    return stdin;
  }

// https://msdn.microsoft.com/en-us/library/aa298581%28v=vs.60%29.aspx
/*
   int result;
   // Set "stdin" to have binary mode:
   result = _setmode( _fileno( stdin ), _O_BINARY );
   if( result == -1 )
      perror( "Cannot set mode" );
   else
      printf( "'stdin' successfully changed to binary mode\n" );
*/

  FILE* f = fopen(input_path, "rb");
  if (f == 0) {
    perror("fopen");
    exit(1);
  }
  return f;
}

static FILE *OpenOutputFile(const char *output_path, const int force) {
//  if (output_path == 0) {
//    return fdopen(STDOUT_FILENO, "wb");
//  }
/*
tools\bro.cc(148): error: identifier "STDOUT_FILENO" is undefined
      return fdopen(STDOUT_FILENO, "wb");
                    ^
*/
  if (output_path == 0) {
    setmode(_fileno( stdout ), O_BINARY);
    return stdout;
  }
  if (!force) {
    struct stat statbuf;
    if (stat(output_path, &statbuf) == 0) {
      fprintf(stderr, "output file exists\n");
      exit(1);
    }
  }
//  int fd = open(output_path, O_CREAT | O_WRONLY | O_TRUNC,
//                S_IRUSR | S_IWUSR);
/*
tools\bro.cc(158): error: identifier "S_IRUSR" is undefined
                  S_IRUSR | S_IWUSR);
                  ^

tools\bro.cc(158): error: identifier "S_IWUSR" is undefined
                  S_IRUSR | S_IWUSR);
                            ^
*/
  FILE* f = fopen(output_path, "wb");
/*
  if (fd < 0) {
    perror("open");
    exit(1);
  }
  return fdopen(fd, "wb");
*/
  if (f == 0) {
    perror("fopen");
    exit(1);
  }
  return f;
}

And the actual console dump of how the compilation went:

// The next log/source is modified (for Windows compatibility) Brotli:

/*
D:\brotli-master>type makeEXE.bat
cd dec
icl /O3 /c bit_reader.c decode.c huffman.c state.c streams.c
cd..
cd enc
icl /O3 /c backward_references.cc block_splitter.cc brotli_bit_stream.cc encode.cc encode_parallel.cc entropy_encode.cc histogram.cc literal_cost.cc metablock.cc static_dict.cc streams.cc
cd..
cd tools
icl /O3 bro.cc ..\dec\bit_reader.obj ..\dec\decode.obj ..\dec\huffman.obj ..\dec\state.obj ..\dec\streams.obj ..\enc\backward_references.obj ..\enc\block_splitter.obj ..\enc\brotli_bit_stream.obj ..\enc\encode.obj ..\enc\encode_parallel.obj ..\enc\entropy_encode.obj ..\enc\histogram.obj ..\enc\literal_cost.obj ..\enc\metablock.obj ..\enc\static_dict.obj ..\enc\streams.obj

D:\brotli-master>makeEXE.bat

D:\brotli-master>cd dec

D:\brotli-master\dec>icl /O3 /c bit_reader.c decode.c huffman.c state.c streams.c
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140726
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

bit_reader.c
decode.c
huffman.c
state.c
streams.c

D:\brotli-master\dec>cd..

D:\brotli-master>cd enc

D:\brotli-master\enc>icl /O3 /c backward_references.cc block_splitter.cc brotli_bit_stream.cc encode.cc encode_parallel.cc entropy_encode.cc histogram.cc literal_cost.cc metablock.cc static_dict.cc streams.cc
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140726
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

backward_references.cc
block_splitter.cc
brotli_bit_stream.cc
encode.cc
encode_parallel.cc
entropy_encode.cc
histogram.cc
literal_cost.cc
metablock.cc
static_dict.cc
streams.cc

D:\brotli-master\enc>cd..

D:\brotli-master>cd tools

D:\brotli-master\tools>icl /O3 bro.cc ..\dec\bit_reader.obj ..\dec\decode.obj ..\dec\huffman.obj ..\dec\state.obj ..\dec\streams.obj ..\enc\backward_references.obj ..\enc\block_splitter.obj ..\enc\brotli_bit_stream.obj ..\enc\encode.obj ..\enc\encode_parallel.obj ..\enc\entropy_encode.obj ..\enc\histogram.obj ..\enc\literal_cost.obj ..\enc\metablock.obj ..\enc\static_dict.obj ..\enc\streams.obj
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140726
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

bro.cc
Microsoft (R) Incremental Linker Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:bro.exe
bro.obj
..\dec\bit_reader.obj
..\dec\decode.obj
..\dec\huffman.obj
..\dec\state.obj
..\dec\streams.obj
..\enc\backward_references.obj
..\enc\block_splitter.obj
..\enc\brotli_bit_stream.obj
..\enc\encode.obj
..\enc\encode_parallel.obj
..\enc\entropy_encode.obj
..\enc\histogram.obj
..\enc\literal_cost.obj
..\enc\metablock.obj
..\enc\static_dict.obj
..\enc\streams.obj

D:\brotli-master\tools>dir br*.exe
 Volume in drive D is S640_Vol5
 Volume Serial Number is 5861-9E6C

 Directory of D:\brotli-master\tools

09/24/2015  06:56 AM         1,250,304 bro.exe
               1 File(s)      1,250,304 bytes
               0 Dir(s)   5,917,040,640 bytes free

D:\brotli-master\tools>bro
;
D:\brotli-master\tools>bro /?
Usage: bro [--force] [--quality n] [--decompress] [--input filename] [--output filename] [--repeat iters] [--verbose]

D:\brotli-master\tools>
*/

And a final note, a byte angry, in your promoting paper you say "Decompresses much faster than current LZMA implementations", usually amateurs like me use 2x, 3x or 15x, your much is not good, one would think from 2x to 20x. Also why don't you mention the current best (IMO) decompressor on INTERNET?! Not mentioning it (LzTurbo) is like disrespecting not only the man behind it but the BEST as a general notion, yes?

Hope you will refine Brotli and make it usable hi-performance console tool.

Regards, Kaze

Sanmayce commented 8 years ago

The above quick run was done on my Core 2 laptop, on Haswell the 3x may jump up to 5x hands down, hate that I don't have Haswell or alike to share the actual stats.

I am not alone, one overclocker helped me a lot to benchmark your Brotli against the superb Zstd and my Shifune:

D:\Showdown_Brotli_vs_Zstd_vs_GZIP_vs_Shifune>dir

09/26/2015 10:08 PM  13,713,275 Complete_Works_of_Fyodor_Dostoyevsky.txt
09/26/2015 10:35 PM   3,717,191 Complete_Works_of_Fyodor_Dostoyevsky.txt.4MB.lzturbo12-39.lzt
09/26/2015 10:16 PM   3,717,583 Complete_Works_of_Fyodor_Dostoyevsky.txt.brotli                         ! 153.228 MB/s; 364.29 MB/s; 397.508 MB/s !
09/08/2015 02:33 AM   4,582,363 Complete_Works_of_Fyodor_Dostoyevsky.txt.Nakamichi                      ! 448 MB/s; 2112 MB/s; 1728 MB/s!
09/26/2015 10:19 PM   4,617,360 Complete_Works_of_Fyodor_Dostoyevsky.txt.zip
09/26/2015 10:11 PM   5,209,670 Complete_Works_of_Fyodor_Dostoyevsky.txt.zst                            ! 302.5 MB/s; 619.8 MB/s; 628.3 MB/s !

09/26/2015 10:08 PM  10,192,446 dickens
09/26/2015 10:35 PM   2,976,910 dickens.4MB.lzturbo12-39.lzt
09/26/2015 10:17 PM   2,962,118 dickens.brotli                                                          ! 146.61 MB/s; 336.342 MB/s; 374.577 MB/s !
09/08/2015 02:33 AM   3,740,418 dickens.Nakamichi                                                       ! 448 MB/s; 1984 MB/s; 1664 MB/s !
09/26/2015 10:19 PM   3,681,828 dickens.zip
09/26/2015 10:11 PM   4,134,924 dickens.zst                                                             ! 298.9 MB/s; 619.7 MB/s; 623.8 MB/s !

09/26/2015 10:09 PM 100,000,000 enwik8
09/26/2015 10:37 PM  29,148,393 enwik8.4MB.lzturbo12-39.lzt
09/26/2015 10:24 PM  27,722,164 enwik8.brotli                                                           ! 134.094 MB/s; 285.19 MB/s; 305.225 MB/s !
09/08/2015 02:33 AM  34,218,460 enwik8.Nakamichi                                                        ! 256 MB/s; 1024 MB/s; 1152 MB/s !
09/26/2015 10:21 PM  35,102,891 enwik8.zip
09/26/2015 10:11 PM  40,024,854 enwik8.zst                                                              ! 325.0 MB/s; 651.6 MB/s; 653.8 MB/s !

09/26/2015 10:09 PM  14,613,183 The_Book_of_The_Thousand_Nights_and_a_Night.txt
09/26/2015 10:37 PM   4,241,855 The_Book_of_The_Thousand_Nights_and_a_Night.txt.4MB.lzturbo12-39.lzt
09/26/2015 10:19 PM   4,163,630 The_Book_of_The_Thousand_Nights_and_a_Night.txt.brotli                  ! 144.867 MB/s; 336.624 MB/s; 372.129 MB/s !
09/08/2015 02:33 AM   5,293,102 The_Book_of_The_Thousand_Nights_and_a_Night.txt.Nakamichi               ! 384 MB/s; 1984 MB/s; 1600 MB/s !
09/26/2015 10:22 PM   5,198,949 The_Book_of_The_Thousand_Nights_and_a_Night.txt.zip
09/26/2015 10:11 PM   5,932,453 The_Book_of_The_Thousand_Nights_and_a_Night.txt.zst                     ! 305.8 MB/s; 625.4 MB/s; 631.3 MB/s !

09/26/2015 10:09 PM   4,445,260 The_Project_Gutenberg_EBook_of_The_King_James_Bible_kjv10.txt
09/26/2015 10:37 PM   1,089,279 The_Project_Gutenberg_EBook_of_The_King_James_Bible_kjv10.txt.4MB.lzturbo12-39.lzt
09/26/2015 10:19 PM   1,087,439 The_Project_Gutenberg_EBook_of_The_King_James_Bible_kjv10.txt.brotli    ! 160.277 MB/s; 339.146 MB/s; 370.247 MB/s !
09/08/2015 02:33 AM   1,441,679 The_Project_Gutenberg_EBook_of_The_King_James_Bible_kjv10.txt.Nakamichi ! 704 MB/s; 2432 MB/s; 2368 MB/s !
09/26/2015 10:22 PM   1,320,100 The_Project_Gutenberg_EBook_of_The_King_James_Bible_kjv10.txt.zip
09/26/2015 10:11 PM   1,537,047 The_Project_Gutenberg_EBook_of_The_King_James_Bible_kjv10.txt.zst       ! 320.7 MB/s; 656.1 MB/s; 668.6 MB/s !

09/26/2015 10:09 PM   3,265,536 University_of_Canterbury_The_Calgary_Corpus.tar
09/26/2015 10:37 PM     921,958 University_of_Canterbury_The_Calgary_Corpus.tar.4MB.lzturbo12-39.lzt
09/26/2015 10:19 PM     867,503 University_of_Canterbury_The_Calgary_Corpus.tar.brotli                  ! 144.849 MB/s; 266.176 MB/s; 283.114 MB/s !
09/08/2015 02:33 AM   1,319,701 University_of_Canterbury_The_Calgary_Corpus.tar.Nakamichi               ! 576 MB/s; 1792 MB/s; 1792 MB/s !
09/26/2015 10:22 PM   1,017,658 University_of_Canterbury_The_Calgary_Corpus.tar.zip
09/26/2015 10:11 PM   1,174,349 University_of_Canterbury_The_Calgary_Corpus.tar.zst                     ! 367.3 MB/s; 740.6 MB/s; 747.2 MB/s !

D:\Showdown_Brotli_vs_Zstd_vs_GZIP_vs_Shifune>

Note: Skylake is good, however being newer than Broadwell I expected more.

Sanmayce commented 8 years ago

github

I was wrong about 5x hands down, in fact it is 4x-6x.

Sanmayce commented 8 years ago

Just a note on the unfairness about comparing LzTurbo, I deliberately hurt the compression ratio of LzTurbo by choosing 4MB block, my intention was to silent the empty-talkers who always complain, using 4MB sliding window gives significantly better compression since the inheritance in chunks/blocks is lost, this is in its turn unfair to LzTurbo!

Nowadays, with constantly growing #cores & #caches & RAM size the bigger is the better, so all these resources have to be utilized not like now - UNDERUTILIZED. However, running with small resources footprint is so cool - BUT ONLY IN HEAVY MULTI-THREADING, yes?!

eustas commented 8 years ago

I've added support for icc recently. Though it looks that icc and clang produce about 10% slower binary than gcc 5.2.0