Closed sk1p closed 1 year ago
@sk1p sorry for the delay. Just started having the same issue myself and so I checked back here. I'll try and tweak the CI builds to make sure this doesn't happen.
I'm trying to come up with a good solution here. I think what's going on is that the optimiser is turning standard code into AVX instructions. Initially I'd thought just using an earlier -march
setting would be what was required, but I now worry that will mean that the code explicitly calling AVX intrinsics won't be built in. I think that a combination of a late -march
but an earlier -mtune
might do what we want, i.e. the generic code is lightly optimised to non AVX512 instructions, but the code using AVX512 intrinsics still builds, but I need to test this.
Hm, I didn't look into the details, but one option would be to build multiple versions into one generic wheel - for example, function multi-versioning could be used. That would possibly allow to make it still work on hardware without AVX512, yet take advantage of recent vector instruction sets on supporting hardware.
When installing the binary wheel from pypi on my laptop, the
decompress_lz4
function crashes at the instructionVPBROADCASTD
somewhere in lz4 code, which was added with AVX512 (my laptop support up to AVX2 only). Would it be possible to either have a generic binary or build multiple and dynamically choose the right Python extension that is compatible with the current CPU?Related to https://github.com/conda-forge/bitshuffle-feedstock/issues/7
Installing from source via
pip install --no-binary "bitshuffle" bitshuffle
results in a workingbitshuffle
install.