openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.56k stars 1.74k forks source link

Add 'Brotli' to the standard & implement it #3844

Open RubenKelevra opened 9 years ago

RubenKelevra commented 9 years ago

Brotli is a new Google algorithm for compression, compressing is nearly as fast as zlib but compress ratio is higher than lzma.

More infos: http://google-opensource.blogspot.de/2015/09/introducing-brotli-new-compression.html?m=0

ryao commented 9 years ago

That is not strictly correct when looking at the Brotli paper's Canterbury corpus results. Brolti-1 does offer slightly higher compression than gzip-9 while having slightly faster compression and decompression than gzip-1. Brotli-9 is better than both gzip-9 (slightly better in compression/decompression) and lzma-1 (slightly better in compression ratio) in all aspects. However, the compression is not better than lzma-9 unless you use Brolti-11 where decompression is slower than gzip (still far faster than lzma) and compression speed is 8 times slower than lzma-9.

http://www.gstatic.com/b/brotlidocs/brotli-2015-09-22.pdf

That said, the paper omits measurements of memory consumption, performance on incompressible data and results on the Silesia corpus, which is arguably a better general benchmark than the Canterbury corpus:

http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia

These are things that should be evaluated so we have a good view of the algorithm's characteristics. That being said, if the memory consumption is not too great, I could see us adding Brotli.

ryao commented 9 years ago

There is benchmark data showing Brotli and gzip on the Silesia corpus' individual files here:

https://quixdb.github.io/squash-benchmark/#results

Brotli-1 looks fairly good in comparison to zlib. Brotli-9 does not look quite as good in the ones that I checked, although I would prefer to see these numbers presented for the entire corpus rather than individual pieces of it. Memory requirements and incompressible performance numbers are also absent.

ryao commented 9 years ago

It probably should also be said that the use cases in which Brotli was designed to excel are better served by putting Brotli in user space so that precompressed files can be served to clients. That does not stop us from using it, but it does mean that evaluation for filesystem use gives Brotli a handicap.

Speaking of evaluation for filesystem use, we should do tests by breaking the Silesia corpus into 128KB record sized chunks, compressing/decompressing each and calculating times+ratios on those. That would allow for proper evaluation, although memory usage with 16MB records (from large_blocks) and incompressible performance would need to be considered as well.

RubenKelevra commented 8 years ago

Great news, thanks for the deep look at it! :) I really love to use it for our backup-storages, where high compression ratio is most important, but xz drops the speed to much. So we just send | receive on the backup-storage while recompressing from lz4 to gzip9. Which means we can switch over to brotli in future and save more snapshot while slowly mitigate to a fully brotli compressed storage :)