linkedin / migz

Multithreaded, gzip-compatible compression and decompression, available as a platform-independent Java library and command-line utilities.
BSD 2-Clause "Simplified" License
77 stars 12 forks source link

Windows Command-Line Tools #1

Open Sanmayce opened 5 years ago

Sanmayce commented 5 years ago

Hi, could you provide binaries for Windows? Wanna include MiGz into my benchmark roster using 512KB and bigger blocks, by the way, how bigger they can be?

Your explanation of the acronym is not clear to me, please explain: "... also supports multithreaded decompression, which is especially important for large files that are read repeatedly. Hence, MiGz."

What does the little 'i' stand for?

Allow me two more questions:

     1447249    12.6       0.50     471.90   brotli 11d29                     ftp.gnu.org_grep-3.3.tar
     1455165    12.6       2.10     137.62   lzma 9d29:fb273:mf=bt4           ftp.gnu.org_grep-3.3.tar
     1489213    12.9       0.24    1063.22   oodle 139 ‘Leviathan’            ftp.gnu.org_grep-3.3.tar
     1496718    13.0       0.18    1322.62   oodle 129 ‘Hydra’                ftp.gnu.org_grep-3.3.tar
     1496718    13.0       0.45    1322.01   oodle 89 ‘Kraken’                ftp.gnu.org_grep-3.3.tar
     1513749    13.1       2.24    1070.13   zstd 22d29                       ftp.gnu.org_grep-3.3.tar
     1517944    13.2       0.16     346.06   lzham 4fb258:x4:d29              ftp.gnu.org_grep-3.3.tar
     1521395    13.2       1.49    1552.77   lzturbo 39                       ftp.gnu.org_grep-3.3.tar
     1756302    15.2      39.10    1542.17   lzturbo 32                       ftp.gnu.org_grep-3.3.tar
     1774686    15.4      21.85    1145.93   zstd 12                          ftp.gnu.org_grep-3.3.tar
     1782164    15.5       1.52    1953.87   lzturbo 29                       ftp.gnu.org_grep-3.3.tar
     1875468    16.3       1.47    1581.11   lizard 49                        ftp.gnu.org_grep-3.3.tar
     2114046    18.4      54.64     886.77   oodle 132 ‘Leviathan’            ftp.gnu.org_grep-3.3.tar
     2163309                       1548      Nakamichi 'Ryuugan-ditto-1TB'    ! Outside TurboBench, Intel-v15.0-64bit-archSSE41 compile !
     2172516    18.9       2.42    3501.52   oodle 118 ‘Selkie’               ftp.gnu.org_grep-3.3.tar
     2172516    18.9       2.42    3495.15   oodle 116 ‘Selkie’               ftp.gnu.org_grep-3.3.tar
     2359093    20.5     306.55    1233.67   lzturbo 30                       ftp.gnu.org_grep-3.3.tar
     2404889    20.9      15.35     333.83   zlib 9                           ftp.gnu.org_grep-3.3.tar
     2406525    20.9      46.42    3756.11   oodle 114 ‘Selkie’               ftp.gnu.org_grep-3.3.tar

As you can see from the above (random) test, LzTurbo 30 decompresses 1233.67/333.83=3.69x and compresses 306.55/15.35=19.97x faster than zlib 9!

     6998037    17.4       0.90     849.88   lzturbo 39                       Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7022279    17.4       0.37     321.86   brotli 11d29                     Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7049563    17.5       1.38     684.89   zstd 22d29                       Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7071491    17.5       0.29     644.84   oodle 139 ‘Leviathan’            Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7103502    17.6       0.40     724.20   oodle 89 ‘Kraken’                Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7105986    17.6       0.23     723.98   oodle 129 ‘Hydra’                Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7125187    17.7      10.61      25.82   bzip2                            Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7960854    19.8       0.91    1302.16   lzturbo 29                       Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     8061825    20.0       1.38     924.79   lizard 49                        Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     9041702                       1077      Nakamichi 'Ryuugan-ditto-1TB'    ! Outside TurboBench, Intel-v15.0-64bit-archSSE41 compile !
     9314676    23.1      37.27    1085.78   lzturbo 32                       Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     9759547    24.2       0.57    1812.03   oodle 116 ‘Selkie’               Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     9759547    24.2       0.57    1810.97   oodle 118 ‘Selkie’               Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
    10771358    26.7       4.30     320.47   zlib 9                           Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar

For the second random example, LzTurbo 29 is 4x faster than zlib 9 in decompression, no need of 4 cores.

Sanmayce commented 5 years ago

For more benchmarks, 11 in total so far:

TEXTORAMIC_Decompression_Showdown_2019-Feb-21.pdf: https://drive.google.com/file/d/162cikKQ3QDiXhUaz_uBJG9unXhoJWwTQ/view?usp=sharing

Nakamichi_Ryuugan-ditto-1TB_btree_source.pdf: https://drive.google.com/file/d/1dFtfvpcE-TUo_D_Ol9C8FSDXlbzyIll9/view?usp=sharing

jeffpasternack commented 5 years ago

Hi Sanmayce--sorry for the delay in responding; I need to check my Github notification settings, it seems!

Thanks for your benchmark results, by the way--very interesting!

Sanmayce commented 5 years ago

Thank you Jeff for detailed explanation, you see, me being an C amateur "clouds" my extra limited knowledge of other languages and what their purposes are. My main idea when asking for binary was to include the executable in my future rosters, (all under Windows, alas). Now, I realize that MiGz targets different environments, I didn't even know that Java doesn't produce stand-alone "executables", excuse me for the profane question.

jeffpasternack commented 5 years ago

Hi Sanmayce, FWIW I'm sure there are tools that can build self-contained executables for Java programs (and probably for compilation to native code rather than bytecode, too), so while we don't provide such executables ourselves you should be able to build them if you're sufficiently motivated :) (IMO it's easy enough to just invoke it using the java command-line tool as normal, though.)

Sanmayce commented 5 years ago

Hi Jeff, thanks again. Please consider something that would gladden eyes of many decompression benchmarkers - running MiGz on some many-cores CPU with enwik9: http://mattmahoney.net/dc/text.html

As far as I see, this COMPUTEX 2019 (May 27) will set a new trend in CPUs - 8 cores affordable by poor-people in the long run. I myself intend to have Matisse with 16 threads. Simply, MiGz' forte ought to be shown on some modern machine, enwik9 is one superb roster, de facto THE BENCHMARK.

jeffpasternack commented 5 years ago

Thanks for the suggestion, Sanmayce. I didn't know about this standardized dataset, but will definitely keep it in mind if/when I have time to run more benchmarks in the future. In practice we're already using MiGz on very high core count servers (e.g. 32+ logical cores) but the cores tend to be individually slower than those you'd find in higher-end desktops.

Sanmayce commented 5 years ago

Your German XML dump is superbly on point, but not having references (other performers) is kinda not telling. In my view, MiGz will set a Pareto Frontier (decompression rate vs compressed size) with enwik9, it is interesting to see how threaded decompression fares against well-optimized single-threaded competition.

After a month or so, my toy Nakamichi will finish enwik9, my expectation is to set a Pareto Frontier: http://www.sanmayce.com/Nakamichi/index.html#2019Apr08 Having done it, will ask Dr. Mahoney to add it to 'Large Text Compression Benchmark'.

cielavenir commented 4 years ago

I have written DEFLATE compressors suite, whose backend has zlib / 7-zip / zopfli / miniz / libslz / libdeflate / zlib-ng / igzip. And I have added MiGz format frontend with parallel compression / decompression. Help yourself if interested. However currently parallel compression will get fast only when the compression level is high.

https://github.com/cielavenir/7bgzf/tree/dev https://www.dropbox.com/s/cv0wbbhgbzkfavl/7ciso190925.7z?dl=0 # Win32 / Win64 binary

jeffpasternack commented 4 years ago

Thanks, @cielavenir! Looks very nice. I'll be particularly interested to compare the performance/overhead of C multithreading vs. Java's (assuming the underlying zlib is the same).

At a low compression levels, it might be that your machine is unable to stream data from disk quickly enough to benefit from multithreading--I haven't observed this myself, but my test machine has a rather fast SSD.

Sanmayce commented 4 years ago

@cielavenir It smells like cooking an yummy benchmark, if you want to initiate a thread where the enwik9 is benchmarked with your Win64 binary along with the latest Zstd modes 1 to 22, I am in...

F:\ENWIK9_benchmark_Zstd>timer64 zstd-v1.4.3-win64.exe -b1 -e22 --threads=64 -i33 enwik9
 1#enwik9            :1000000000 -> 357434859 (2.798),1011.9 MB/s , 680.9 MB/s
 2#enwik9            :1000000000 -> 329130073 (3.038), 680.0 MB/s , 552.9 MB/s
 3#enwik9            :1000000000 -> 313570458 (3.189), 280.2 MB/s , 504.7 MB/s
 4#enwik9            :1000000000 -> 307725039 (3.250), 204.0 MB/s , 495.8 MB/s
 5#enwik9            :1000000000 -> 301808803 (3.313), 161.9 MB/s , 470.9 MB/s
 6#enwik9            :1000000000 -> 295292703 (3.386), 119.4 MB/s , 485.4 MB/s
 7#enwik9            :1000000000 -> 285005952 (3.509),  86.7 MB/s , 520.3 MB/s
 8#enwik9            :1000000000 -> 280885195 (3.560),  70.2 MB/s , 541.4 MB/s
 9#enwik9            :1000000000 -> 278440978 (3.591),  52.6 MB/s , 549.3 MB/s
10#enwik9            :1000000000 -> 273739917 (3.653),  42.8 MB/s , 545.9 MB/s
11#enwik9            :1000000000 -> 271346644 (3.685),  36.5 MB/s , 550.1 MB/s
12#enwik9            :1000000000 -> 269278253 (3.714),  23.2 MB/s , 557.0 MB/s
13#enwik9            :1000000000 -> 265978647 (3.760),  24.4 MB/s , 567.5 MB/s
14#enwik9            :1000000000 -> 261516483 (3.824),  19.7 MB/s , 573.8 MB/s
15#enwik9            :1000000000 -> 258702580 (3.865),  15.7 MB/s , 574.6 MB/s
16#enwik9            :1000000000 -> 250158490 (3.997),  14.0 MB/s , 573.9 MB/s
17#enwik9            :1000000000 -> 242890314 (4.117),  9.82 MB/s , 540.3 MB/s
18#enwik9            :1000000000 -> 239733542 (4.171),  8.13 MB/s , 499.4 MB/s
19#enwik9            :1000000000 -> 235599635 (4.244),  6.22 MB/s , 448.9 MB/s
20#enwik9            :1000000000 -> 226011360 (4.425),  5.18 MB/s , 548.0 MB/s
21#enwik9            :1000000000 -> 220256419 (4.540),  3.28 MB/s , 547.9 MB/s
22#enwik9            :1000000000 -> 215061264 (4.650),  1.74 MB/s , 544.7 MB/s

The above results are for i7-3630QM, the initial package is downloadable at: https://drive.google.com/file/d/1N8MmC34alEZGeMB6gZw-Vg2BqTZRxkbT/view?usp=sharing

Sanmayce commented 4 years ago

If you have written it as a benchmark suite (similarly to lzbench and turbobench) it would be great. Having all fast DEFLATE implementations in C, multi-threaded, under one roof is exciting! To test their speed, what better way of throwing against the awesome Zstd?

cielavenir commented 4 years ago

perhaps I should print the processing time, but for now please:

for meth in cz1 cz2 cz3 cz4 cz5 cz6 cz7 cz8 cz9 cS1 cS2 cS3 cS4 cS5 cS6 cS7 cS8 cS9 cZ1 cZ2 cs1 cl1 cl2 cl3 cl4 cl5 cl6 cl7 cl8 cl9 cl10 cl11 cl12 cn1 cn2 cn3 cn4 cn5 cn6 cn7 cn8 cn9 ci1 ci2 ci3 ci4; do
echo $meth
time 7migz -${meth} -@64 < enwik9 > enwik9.enc;ls -l enwik9.enc
done
Sanmayce commented 4 years ago

I'm a Windows user; an suggestion, putting the output of above script on your homepage (as other authors do) would be informative, now no one knows how fast your code is. This script only compresses, yes? Or you decompress after that as well?

cielavenir commented 4 years ago
Sanmayce commented 4 years ago

@cielavenir Did a quick run with the updated enwik9_Zstd package, downloadable at: https://drive.google.com/file/d/1B83Ktm0GI7ACRvcjvMiQXdTLznD9dxCG/view?usp=sharing

For Windows 10 64bit, laptop with I7-3630QM 4cores/8threads and 16GB DDR3, SSD Samsung 860 PRO 256GB:

Compressor                                               |   Ellapsed Time |                  Output
---------------------------------------------------------------------------------------------------- 
1 (zlib) 7migz -cz1 -@64  0<enwik9 1>enwik9.cz1          |    4.140813 sec | 379,731,414 enwik9.cz1
2 (zlib) 7migz -cz2 -@64  0<enwik9 1>enwik9.cz2          |    4.484577 sec | 365,654,349 enwik9.cz2
3 (zlib) 7migz -cz3 -@64  0<enwik9 1>enwik9.cz3          |    5.187740 sec | 355,443,967 enwik9.cz3
4 (zlib) 7migz -cz4 -@64  0<enwik9 1>enwik9.cz4          |    5.719017 sec | 339,905,974 enwik9.cz4
5 (zlib) 7migz -cz5 -@64  0<enwik9 1>enwik9.cz5          |    7.547227 sec | 329,417,750 enwik9.cz5
6 (zlib) 7migz -cz6 -@64  0<enwik9 1>enwik9.cz6          |    9.437935 sec | 326,115,303 enwik9.cz6
7 (zlib) 7migz -cz7 -@64  0<enwik9 1>enwik9.cz7          |   10.297354 sec | 325,489,808 enwik9.cz7
8 (zlib) 7migz -cz8 -@64  0<enwik9 1>enwik9.cz8          |   11.125513 sec | 325,006,523 enwik9.cz8
9 (zlib) 7migz -cz9 -@64  0<enwik9 1>enwik9.cz9          |   11.125514 sec | 325,000,606 enwik9.cz9
---------------------------------------------------------------------------------------------------- 
1 (7zip) 7migz -cS1 -@64  0<enwik9 1>enwik9.cS1          |    7.562848 sec | 339,119,944 enwik9.cS1
2 (7zip) 7migz -cS2 -@64  0<enwik9 1>enwik9.cS2          |    7.515976 sec | 339,119,944 enwik9.cS2
3 (7zip) 7migz -cS3 -@64  0<enwik9 1>enwik9.cS3          |    7.547228 sec | 339,119,944 enwik9.cS3
4 (7zip) 7migz -cS4 -@64  0<enwik9 1>enwik9.cS4          |    7.531599 sec | 339,119,944 enwik9.cS4
5 (7zip) 7migz -cS5 -@64  0<enwik9 1>enwik9.cS5          |   26.157469 sec | 315,436,344 enwik9.cS5
6 (7zip) 7migz -cS6 -@64  0<enwik9 1>enwik9.cS6          |   26.219970 sec | 315,436,344 enwik9.cS6
7 (7zip) 7migz -cS7 -@64  0<enwik9 1>enwik9.cS7          |   72.769007 sec | 313,290,259 enwik9.cS7
8 (7zip) 7migz -cS8 -@64  0<enwik9 1>enwik9.cS8          |   72.784638 sec | 313,290,259 enwik9.cS8
9 (7zip) 7migz -cS9 -@64  0<enwik9 1>enwik9.cS9          |  175.492552 sec | 313,031,626 enwik9.cS9
---------------------------------------------------------------------------------------------------- 
1 (libdeflate) 7migz -cl1 -@64  0<enwik9 1>enwik9.cl1    |    3.359530 sec | 357,276,709 enwik9.cl1
2 (libdeflate) 7migz -cl2 -@64  0<enwik9 1>enwik9.cl2    |    3.547040 sec | 345,865,789 enwik9.cl2
3 (libdeflate) 7migz -cl3 -@64  0<enwik9 1>enwik9.cl3    |    3.781422 sec | 340,919,046 enwik9.cl3
4 (libdeflate) 7migz -cl4 -@64  0<enwik9 1>enwik9.cl4    |    4.015808 sec | 337,870,102 enwik9.cl4
5 (libdeflate) 7migz -cl5 -@64  0<enwik9 1>enwik9.cl5    |    4.453332 sec | 328,713,486 enwik9.cl5
6 (libdeflate) 7migz -cl6 -@64  0<enwik9 1>enwik9.cl6    |    4.843972 sec | 326,477,090 enwik9.cl6
7 (libdeflate) 7migz -cl7 -@64  0<enwik9 1>enwik9.cl7    |    5.250245 sec | 325,606,455 enwik9.cl7
8 (libdeflate) 7migz -cl8 -@64  0<enwik9 1>enwik9.cl8    |   15.031952 sec | 318,781,742 enwik9.cl8
9 (libdeflate) 7migz -cl9 -@64  0<enwik9 1>enwik9.cl9    |   19.297770 sec | 315,242,759 enwik9.cl9
10 (libdeflate) 7migz -cl10 -@64  0<enwik9 1>enwik9.cl10 |   21.360367 sec | 313,801,807 enwik9.cl10
11 (libdeflate) 7migz -cl11 -@64  0<enwik9 1>enwik9.cl11 |   28.079433 sec | 313,156,477 enwik9.cl11
12 (libdeflate) 7migz -cl12 -@64  0<enwik9 1>enwik9.cl12 |   35.236018 sec | 313,033,510 enwik9.cl12
---------------------------------------------------------------------------------------------------- 
1 (zlibng) 7migz -cn1 -@64  0<enwik9 1>enwik9.cn1        |    2.578248 sec | 506,934,955 enwik9.cn1
2 (zlibng) 7migz -cn2 -@64  0<enwik9 1>enwik9.cn2        |    5.172112 sec | 362,046,146 enwik9.cn2
3 (zlibng) 7migz -cn3 -@64  0<enwik9 1>enwik9.cn3        |    5.765888 sec | 347,989,009 enwik9.cn3
4 (zlibng) 7migz -cn4 -@64  0<enwik9 1>enwik9.cn4        |    6.500298 sec | 331,634,506 enwik9.cn4
5 (zlibng) 7migz -cn5 -@64  0<enwik9 1>enwik9.cn5        |    8.062875 sec | 331,400,800 enwik9.cn5
6 (zlibng) 7migz -cn6 -@64  0<enwik9 1>enwik9.cn6        |    9.531695 sec | 329,505,194 enwik9.cn6
7 (zlibng) 7migz -cn7 -@64  0<enwik9 1>enwik9.cn7        |   13.625633 sec | 325,457,108 enwik9.cn7
8 (zlibng) 7migz -cn8 -@64  0<enwik9 1>enwik9.cn8        |   14.578805 sec | 325,006,403 enwik9.cn8
9 (zlibng) 7migz -cn9 -@64  0<enwik9 1>enwik9.cn9        |   14.766304 sec | 325,000,559 enwik9.cn9
---------------------------------------------------------------------------------------------------- 
1 (igzip) 7migz -ci1 -@64  0<enwik9 1>enwik9.ci1         |    2.000093 sec | 391,100,674 enwik9.ci1
2 (igzip) 7migz -ci2 -@64  0<enwik9 1>enwik9.ci2         |    2.203224 sec | 374,181,750 enwik9.ci2
3 (igzip) 7migz -ci3 -@64  0<enwik9 1>enwik9.ci3         |    2.218850 sec | 365,630,779 enwik9.ci3
4 (igzip) 7migz -ci4 -@64  0<enwik9 1>enwik9.ci4         |    5.125235 sec | 359,555,315 enwik9.ci4
---------------------------------------------------------------------------------------------------- 

Note1: Under Windows filenames are case insensitive, thus your 's' and 'S' and 'z' and 'Z' are overlapping. Note2: Could you add printing the block sizes for each run, e.g. 7zip mode 9 seems like 7zip mode 5. Note3: The decompression is awesome, around 0.9 seconds, wow!

Could you explain why so many executables, I wish I had one only to include in future benchmarks...

cielavenir commented 4 years ago

Note1: Under Windows filenames are case insensitive

Yes, but options are case sensitive. ...Perhaps you need to change the output filename, though.

7zip mode 9 seems like 7zip mode 5

What do you mean... I can see -cS1==-cS4, -cS5==-cS6, -cS7==-cS8 though. I read such information in 7-zip documentation. cf https://sevenzip.osdn.jp/chm/cmdline/switches/method.htm#ZipX However I see -cS5 is different from -cS9.

Could you explain why so many executables

Before, I have said DEFLATE compressors suite, which has many frontends and backends. 7migz is the MiGz frontend related to this topic. (I have somehow added GZinga frontend. It is unrelated to this topic but you can see if curious.)