kiyo-masui / bitshuffle

Filter for improving compression of typed binary data.
Other
215 stars 76 forks source link

Build bitshuffle wheels #98

Closed james-s-willis closed 3 years ago

james-s-willis commented 3 years ago

These changes automate building bitshuffle wheels for Linux x86.

How do you want the wheels uploaded to PyPI, @kiyo-masui? There are various options (https://cibuildwheel.readthedocs.io/en/stable/deliver-to-pypi/ - you can even automate it and only upload on tagged versions of bitshuffle.

kiyo-masui commented 3 years ago

How does this interact with h5py and the libhdf5 versions/binaries? There is this long standing issue that h5py and bit shuffle need to be linked to the same version fo hdf5, meaning both usually need to be build from source instead of using wheels. Is that fixed now?

jrs65 commented 3 years ago

That was just fixed in #81. Which is why we can now do binary wheels.

kiyo-masui commented 3 years ago

Nice! (I'm clearly not paying close enough attention).

Automated such that is uploads upon tagging seems pretty convenient!

jrs65 commented 3 years ago

Yeah, thanks to @t20100 and @james-s-willis for doing all of the work on that one, it's going to be a wonderful new world of speedy installs and fewer confused students!

kiyo-masui commented 3 years ago

Indeed, that was a huge pain in the ass.

jrs65 commented 3 years ago

@james-s-willis it's actually the mismatch I would like to see tested, e.g. headers used for building are 1.10.7, installed version used at run time is 1.8.11.

jrs65 commented 3 years ago

Basically something that checks that the wheels really are independent of the version of HDF5 installed.

james-s-willis commented 3 years ago

@jrs65 Oh I see what you mean now. I'll see if I can set that test up.

james-s-willis commented 3 years ago

@jrs65, so I build the wheels with HDF5 1.10.7. I then install HDF5 1.8.15 prior to the unit tests. With CPython 3.6 all tests pass, but with CPython 3.7 I get this error with test_h5filter.py which complains about an incorrect datatype, ValueError: Unable to create dataset (not a datatype):

2021-07-05T20:49:43.2284868Z =================================== FAILURES ===================================
2021-07-05T20:49:43.2285394Z ____________________________ TestFilter.test_filter ____________________________
2021-07-05T20:49:43.2285732Z 
2021-07-05T20:49:43.2286258Z self = <test_h5filter.TestFilter testMethod=test_filter>
2021-07-05T20:49:43.2286696Z 
2021-07-05T20:49:43.2287042Z     def test_filter(self):
2021-07-05T20:49:43.2287446Z         shape = (32 * 1024 + 783,)
2021-07-05T20:49:43.2287797Z         chunks = (4 * 1024 + 23,)
2021-07-05T20:49:43.2288181Z         dtype = np.int64
2021-07-05T20:49:43.2288590Z         data = np.arange(shape[0])
2021-07-05T20:49:43.2289059Z         fname = "tmp_test_filters.h5"
2021-07-05T20:49:43.2289503Z         f = h5py.File(fname, "w")
2021-07-05T20:49:43.2289931Z         h5.create_dataset(
2021-07-05T20:49:43.2290296Z             f,
2021-07-05T20:49:43.2290611Z             b"range",
2021-07-05T20:49:43.2290962Z             shape,
2021-07-05T20:49:43.2291292Z             dtype,
2021-07-05T20:49:43.2291637Z             chunks,
2021-07-05T20:49:43.2292029Z             filter_pipeline=(32008, 32000),
2021-07-05T20:49:43.2292610Z             filter_flags=(h5z.FLAG_MANDATORY, h5z.FLAG_MANDATORY),
2021-07-05T20:49:43.2293143Z >           filter_opts=None,
2021-07-05T20:49:43.2293502Z         )
2021-07-05T20:49:43.2293705Z 
2021-07-05T20:49:43.2294279Z /project/tests/test_h5filter.py:33: 
2021-07-05T20:49:43.2294747Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2021-07-05T20:49:43.2295307Z bitshuffle/h5.pyx:186: in bitshuffle.h5.create_dataset
2021-07-05T20:49:43.2295832Z     ???
2021-07-05T20:49:43.2296319Z h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper
2021-07-05T20:49:43.2296818Z     ???
2021-07-05T20:49:43.2297302Z h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper
2021-07-05T20:49:43.2297801Z     ???
2021-07-05T20:49:43.2298150Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2021-07-05T20:49:43.2298390Z 
2021-07-05T20:49:43.2298673Z >   ???
2021-07-05T20:49:43.2299140Z E   ValueError: Unable to create dataset (not a datatype)
2021-07-05T20:49:43.2299621Z 
2021-07-05T20:49:43.2299983Z h5py/h5d.pyx:87: ValueError
2021-07-05T20:49:43.2300513Z _______________________ TestFilter.test_with_block_size ________________________
2021-07-05T20:49:43.2300879Z 
2021-07-05T20:49:43.2301426Z self = <test_h5filter.TestFilter testMethod=test_with_block_size>
2021-07-05T20:49:43.2301888Z 
2021-07-05T20:49:43.2302247Z     def test_with_block_size(self):
2021-07-05T20:49:43.2302661Z         shape = (128 * 1024 + 783,)
2021-07-05T20:49:43.2303016Z         chunks = (4 * 1024 + 23,)
2021-07-05T20:49:43.2303404Z         dtype = np.int64
2021-07-05T20:49:43.2303813Z         data = np.arange(shape[0])
2021-07-05T20:49:43.2304272Z         fname = "tmp_test_filters.h5"
2021-07-05T20:49:43.2304715Z         f = h5py.File(fname, "w")
2021-07-05T20:49:43.2305141Z         h5.create_dataset(
2021-07-05T20:49:43.2305496Z             f,
2021-07-05T20:49:43.2305824Z             b"range",
2021-07-05T20:49:43.2306161Z             shape,
2021-07-05T20:49:43.2306510Z             dtype,
2021-07-05T20:49:43.2307140Z             chunks,
2021-07-05T20:49:43.2307558Z             filter_pipeline=(32008, 32000),
2021-07-05T20:49:43.2308142Z             filter_flags=(h5z.FLAG_MANDATORY, h5z.FLAG_MANDATORY),
2021-07-05T20:49:43.2308660Z >           filter_opts=((680,), ()),
2021-07-05T20:49:43.2309017Z         )
2021-07-05T20:49:43.2309206Z 
2021-07-05T20:49:43.2309642Z /project/tests/test_h5filter.py:59: 
2021-07-05T20:49:43.2310114Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2021-07-05T20:49:43.2310676Z bitshuffle/h5.pyx:186: in bitshuffle.h5.create_dataset
2021-07-05T20:49:43.2311195Z     ???
2021-07-05T20:49:43.2311684Z h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper
2021-07-05T20:49:43.2312182Z     ???
2021-07-05T20:49:43.2312667Z h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper
2021-07-05T20:49:43.2313163Z     ???
2021-07-05T20:49:43.2313493Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2021-07-05T20:49:43.2313752Z 
2021-07-05T20:49:43.2314034Z >   ???
2021-07-05T20:49:43.2314499Z E   ValueError: Unable to create dataset (not a datatype)
2021-07-05T20:49:43.2314890Z 
2021-07-05T20:49:43.2315254Z h5py/h5d.pyx:87: ValueError
2021-07-05T20:49:43.2315809Z _______________________ TestFilter.test_with_compression _______________________
2021-07-05T20:49:43.2316190Z 
2021-07-05T20:49:43.2316756Z self = <test_h5filter.TestFilter testMethod=test_with_compression>
2021-07-05T20:49:43.2317223Z 
2021-07-05T20:49:43.2317623Z     def test_with_compression(self):
2021-07-05T20:49:43.2318046Z         shape = (128 * 1024 + 783,)
2021-07-05T20:49:43.2318415Z         chunks = (4 * 1024 + 23,)
2021-07-05T20:49:43.2318794Z         dtype = np.int64
2021-07-05T20:49:43.2319202Z         data = np.arange(shape[0])
2021-07-05T20:49:43.2319671Z         fname = "tmp_test_filters.h5"
2021-07-05T20:49:43.2320117Z         f = h5py.File(fname, "w")
2021-07-05T20:49:43.2320547Z         h5.create_dataset(
2021-07-05T20:49:43.2320907Z             f,
2021-07-05T20:49:43.2321238Z             b"range",
2021-07-05T20:49:43.2321572Z             shape,
2021-07-05T20:49:43.2321920Z             dtype,
2021-07-05T20:49:43.2322251Z             chunks,
2021-07-05T20:49:43.2322744Z             filter_pipeline=(32008,),
2021-07-05T20:49:43.2323246Z             filter_flags=(h5z.FLAG_MANDATORY,),
2021-07-05T20:49:43.2323770Z >           filter_opts=((0, h5.H5_COMPRESS_LZ4),),
2021-07-05T20:49:43.2324179Z         )
2021-07-05T20:49:43.2324370Z 
2021-07-05T20:49:43.2324811Z /project/tests/test_h5filter.py:86: 
2021-07-05T20:49:43.2325272Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2021-07-05T20:49:43.2325845Z bitshuffle/h5.pyx:186: in bitshuffle.h5.create_dataset
2021-07-05T20:49:43.2326365Z     ???
2021-07-05T20:49:43.2326852Z h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper
2021-07-05T20:49:43.2327346Z     ???
2021-07-05T20:49:43.2327828Z h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper
2021-07-05T20:49:43.2328407Z     ???
2021-07-05T20:49:43.2328737Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2021-07-05T20:49:43.2328995Z 
2021-07-05T20:49:43.2329277Z >   ???
2021-07-05T20:49:43.2329745Z E   ValueError: Unable to create dataset (not a datatype)
2021-07-05T20:49:43.2330129Z 
2021-07-05T20:49:44.4940352Z ##[error]Command ['sh', '-c', 'CI_BUILD_WHEEL=1 pytest /project/tests'] failed with code 1. 
2021-07-05T20:49:44.4954398Z h5py/h5d.pyx:87: ValueError
2021-07-05T20:49:44.4954832Z 
2021-07-05T20:49:44.4955223Z =========================== short test summary info ============================
2021-07-05T20:49:44.4956730Z FAILED ../project/tests/test_h5filter.py::TestFilter::test_filter - ValueErro...
2021-07-05T20:49:44.4957889Z FAILED ../project/tests/test_h5filter.py::TestFilter::test_with_block_size - ...
2021-07-05T20:49:44.4958717Z FAILED ../project/tests/test_h5filter.py::TestFilter::test_with_compression
2021-07-05T20:49:44.4959444Z =================== 3 failed, 57 passed, 1 skipped in 1.62s ====================

I followed this through to H5DCreate inside h5py but I can't work out the problem. The only difference I can see is between the h5py versions used in CPython 3.7 + h5py 3.3.0 vs CPython 3.6 + h5py 3.1.0. I fixed the version used with CPython 3.7 to h5py 3.1.0 but I get the same error. Maybe, @kiyo-masui has seen this error before?

james-s-willis commented 3 years ago

@jrs65, I've managed to fix the CI workflow file so that HDF5 1.10.7 is always installed before each bitshuffle wheel build and HDF5 1.8.11 is installed prior to each test suite is run. This config still passes all tests. I have also tried installing bitshuffle from the wheels generated on my lab machine running ubuntu 18.04, python 3.7, HDF5 1.10.0-patch1 and it passes the unit tests apart from test_h5plugin.py. Are you able download the wheels from here: https://github.com/kiyo-masui/bitshuffle/actions/runs/1008960943 and try it on your machine Richard?

james-s-willis commented 3 years ago

@jrs65, Shiny has tested the wheels and it works for him. The last thing that needs to be done is to add a secrets.pypi_password to the repo so that the wheels can be uploaded to PyPI. I can't do that so I was hoping @kiyo-masui could do that? There is a guide here: https://docs.github.com/en/actions/reference/encrypted-secrets

Then bitshuffle would need to be tagged to upload the wheels to PyPI after this PR is merged.

kiyo-masui commented 3 years ago

The contents of secrets.pypi_password is just my pipit password? No username or anything else?

kiyo-masui commented 3 years ago

It won't let me add a secret named secrets.pypi_password because the . isn't allowed in the name.

t20100 commented 3 years ago

The secret's name should be pypi_password and contains an API token from the project's settings page on pypi.org (user: __token__).

kiyo-masui commented 3 years ago

Done.

james-s-willis commented 3 years ago

Thanks @kiyo-masui! I'll merge this in now.