kiyo-masui / bitshuffle

Filter for improving compression of typed binary data.
Other
215 stars 76 forks source link

Wheel on PyPi doesn't work on systems without AVX512 #121

Closed sk1p closed 1 year ago

sk1p commented 2 years ago

When installing the binary wheel from pypi on my laptop, the decompress_lz4 function crashes at the instruction VPBROADCASTD somewhere in lz4 code, which was added with AVX512 (my laptop support up to AVX2 only). Would it be possible to either have a generic binary or build multiple and dynamically choose the right Python extension that is compatible with the current CPU?

Related to https://github.com/conda-forge/bitshuffle-feedstock/issues/7

Installing from source via pip install --no-binary "bitshuffle" bitshuffle results in a working bitshuffle install.

jrs65 commented 1 year ago

@sk1p sorry for the delay. Just started having the same issue myself and so I checked back here. I'll try and tweak the CI builds to make sure this doesn't happen.

jrs65 commented 1 year ago

I'm trying to come up with a good solution here. I think what's going on is that the optimiser is turning standard code into AVX instructions. Initially I'd thought just using an earlier -march setting would be what was required, but I now worry that will mean that the code explicitly calling AVX intrinsics won't be built in. I think that a combination of a late -march but an earlier -mtune might do what we want, i.e. the generic code is lightly optimised to non AVX512 instructions, but the code using AVX512 intrinsics still builds, but I need to test this.

sk1p commented 1 year ago

Hm, I didn't look into the details, but one option would be to build multiple versions into one generic wheel - for example, function multi-versioning could be used. That would possibly allow to make it still work on hardware without AVX512, yet take advantage of recent vector instruction sets on supporting hardware.