r-lyeh-archived / bundle

:package: Bundle, an embeddable compression library: DEFLATE, LZMA, LZIP, BZIP2, ZPAQ, LZ4, ZSTD, BROTLI, BSC, CSC, BCM, MCM, ZMOLLY, ZLING, TANGELO, SHRINKER, CRUSH, LZJB and SHOCO streams in a ZIP file (C++03)(C++11)
zlib License
629 stars 89 forks source link

consider snappy #11

Closed mavam closed 9 years ago

mavam commented 9 years ago

It would be great to have snappy as part of the benchmark.

r-lyeh-archived commented 9 years ago

Snappy was already evaluated at some point (see https://github.com/r-lyeh/bundle#evaluated-alternatives). However, LZ4/LZ4HC are always faster (during compression and/or decompression) and also have higher compression ratios, so I just removed Snappy.

More even, LZ4 is still widely popular and still in development (recently it got a new shiny streameable API too), so it is a double win. Here you may find a few benchmarks https://code.google.com/p/data-shrinker/ about both :)

mavam commented 9 years ago

All right, I'm convinced. :+1:

mavam commented 9 years ago

Out of curiosity, is there a similar argument for GZIP and BZIP2?

r-lyeh-archived commented 9 years ago

Well, I am not an export on the field but a few highlights to remark (I can be completly wrong though)

Any GZIP file is basically a tiny header (17 or 19 bytes I guess, from heart) plus a deflate compressed stream. And then there is the ZIP format which is someway more complex than GZIP since it supports multiple options and archives (deflate streams + properties) per zip file. So both GZIP and ZIP will perform very close to the deflate algorithm in bundle. In plain English, it is "only" the header what changes, but not the compressed stream/algorithm itself.

On the other hand, BZIP2 is a more powerful algorithm than deflate (so it is also better than any ZIP and GZIP). Even if BZIP2 has been a good candidate over the years, it has been superseded by a few other compressors in the recent years (~10 yrs). Ie, compared to BZIP2, LZMA compresses quite a lot more, it is much faster and not so much memory hungry for both de/compression cases.

Note: Regarding deflate, there can be a few differences based on how the encoder creates the compatible deflate stream. We are using miniz which is a good tiny alternative. Classic ZLIB will compress some more for sure. And then you can check Zopfli if you really need to push deflate to its limits. All of these encoders can recreate a ZIP/GZIP-valid deflate stream (but then again, results will be much larger than better lzma, bsc, zpaq, csc, brotli, etc encoders)

edit: relevant benchmark http://tukaani.org/lzma/benchmarks.html

mavam commented 9 years ago

Thanks for providing this detailed extra information! :+1: