zarr-developers / numcodecs

A Python package providing buffer compression and transformation codecs for use in data storage and communication applications.
http://numcodecs.readthedocs.io
MIT License
128 stars 88 forks source link

Support external lz4, zstd, blosc, zlib #464

Open haampie opened 1 year ago

haampie commented 1 year ago

Problem description

This package seems to unconditionally build vendored copies of lz4, zstd and blosc. This makes it hard to

  1. integrate this package with other libraries that pull in any of these libraries at a different version -- if it were external, one version could be used for all;
  2. update or patch binary dependencies in case of CVEs;
  3. deal with alternative providers, e.g. zlib-ng instead of zlib.

Also, the "build system" seems to be effectively native only (cpuinfo based), and very much x86 centered. That also wouldn't be a big deal if I could point numcodes to pre-compiled external versions of lz4, zstd and blosc.

Can you please make this package work with an external/system lz4, zstd, and blosc? That'd be very helpful, thanks.

Probably this means using a better build system (maybe meson?) instead of a handwritten setup.py.

martindurant commented 1 year ago

I have mentioned elsehwere, but the package cramjam includes several useful and important compressors in one install, and ther tend to have good preformance compared to other versions. It does not support blosc, but that could be requested and would make numcodecs a much much simpler package to build and distribute.

joshmoore commented 1 year ago

Hi @haampie. I don't think anyone will disagree with your thesis here. It's more a matter of making it happen. Helping hands and suggestions like @martindurant's on getting us there are very welcome!

(e.g. see https://github.com/zarr-developers/numcodecs/pull/274)

BwL1289 commented 3 months ago

Linking #254. I know this is quite old. I'm experiencing this as well.

milesgranger commented 2 months ago

I learned about this issue thru one on cramjam (https://github.com/milesgranger/cramjam/issues/110); I'm fixing to release 2.8.4 w/ blosc2 support; it's available now for testing w/ 2.8.4rc3 on PyPI. It already supports lz4, zstd and others in a small and easy install.

It would be interesting to know if this helps at all here.

martindurant commented 2 months ago

Absolutely, thanks @milesgranger . The code here still references blosc1, so some compat work would be needed I think.

I really think that there is no need for numcodecs to be repeating the build process for these standard compressors and/or relying on system libraries. Some things will still need cython for custom algorithms, but the less the better.