zarr-developers / numcodecs

A Python package providing buffer compression and transformation codecs for use in data storage and communication applications.
http://numcodecs.readthedocs.io
MIT License
125 stars 87 forks source link

Supporting Zopfli #120

Open jakirkham opened 5 years ago

jakirkham commented 5 years ago

The Zopfli compression algorithm from Google is a zlib-style compression algorithm that is able to make a small, but notable improvement on the compression ratio that zlib might otherwise achieve. The catch is this ends up being quite a bit slower. However once compressed the data can be decompressed using standard zlib-style algorithms with practically unchanged decompression speed.

Generally this can be useful in cases where the data must be decompressible via zlib or gzip, compression generally happens once, and minimizing size is of paramount importance. For example, serving data used in webpages. So this could be a nice option for use cases where people are interacting tiles of image data in the web browser (via n5-wasm) for instance.

Note: There are a few Python implementations. One we might use is zopfli, which is on PyPI and conda-forge.

jakirkham commented 5 years ago

cc @aschampion (in case it is of interest)

alimanfoo commented 5 years ago

Sounds like a valuable addition. It looks like there are both zlib-like and gzip-like compress functions, so I guess we'd want two codecs, zopfli-zlib and zopfli-gzip. The zopfli Python package looks like it would be straightforward to wrap.

aschampion commented 5 years ago

Thanks for bringing this to my attention. Looks like there's a rust implementation, so when I have some time I should be able to benchmark it for n5 as I did for brotli.

At least from the rust N5 side, having a non-symmetric compression (i.e., should compress with zopfli but can decompress with normal gzip) could be a bit of an implementation pain. I'm not sure if that's easier in the zarr world. Java N5 already supports arbitrary compression schemes, so may already be able to handle this situation.

jakirkham commented 1 year ago

Maybe it could be a config option on the existing codecs as opposed to new codecs?

Even if it isn't installed one could just warn and fallback to zlib or gzip. At worst one simply gets less compressed data (it is still readable).