Closed james-s-willis closed 2 years ago
This was remarkably quick.
Everything here looks correct to me.
I noticed that my continuous integration checks haven't popped up in the PR. I wonder why that is. In any case, it seemed to not build on travis (https://travis-ci.org/github/kiyo-masui/bitshuffle/jobs/769740650). I think CI was already broken, but it seems to be more broken now.
The other thing that needs to be added are tests. Should be straight forward to duplicate all the unit tests that are done for LZ4 to ZSTD. For regression tests we generate a small amount of test data that gets added directly to the repository. There is a script in the tests/
directory that should be easy to adapt.
Agreed. That was remarkably quick. Nice work!
One thing that seems like it is missing is the ability to tune the amount of compression zstd applies. That's a feature in zstd itself and should get exposed here.
I'll have a more thorough look through later on today.
Thanks!
I have been struggling to expose the compression level for ZSTD due to the bshuf_blocked_wrap_fun
function. I have had to add const int comp_lvl
as an extra argument to quite a few functions, when it's not going to be used in most of them. I'm not sure if this is the best way to do this.
Also, to be able to pass it from kotekan
I added it as:const int comp_lvl = cd_values[5];
to bshuf_h5_filter
. Is this also right?
Regarding the unit tests how do I select ZSTD as the compression library? Can I just use: filter_opts=((0, h5.H5_COMPRESS_ZSTD),),
?
Yeah, that’s a bummer about bshuf_blocked_wrap_fun
... I think adding an extra argument is the cleanest way. It is okay to change the call signature since that isn’t exposed in the .h
file. I would call it something more generic than comp_lvl
, maybe just option
in case there are other functions with integer options we want to wrap in the future.
Yes, I think the new option should go in the cd_values
array.
Yes, those filter options should work.
On May 7, 2021, at 4:24 PM, james-s-willis @.***> wrote:
Thanks!
I have been struggling to expose the compression level for ZSTD due to the bshuf_blocked_wrap_fun function. I have had to add const int comp_lvl as an extra argument to quite a few functions, when it's not going to be used in most of them. I'm not sure if this is the best way to do this.
Also, to be able to pass it from kotekan I added it as:const int comp_lvl = cd_values[5]; to bshuf_h5_filter. Is this also right?
Regarding the unit tests how do I select ZSTD as the compression library? Can I just use: filter_opts=((0, h5.H5_COMPRESS_ZSTD),),?
— You are receiving this because your review was requested. Reply to this email directly, view it on GitHubhttps://github.com/kiyo-masui/bitshuffle/pull/95#issuecomment-834751173, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AADBPH2KEDU4225XVGUD4VDTMRD6ZANCNFSM44HWB2JA.
@kiyo-masui @jrs65 I've been struggling to run the test_h5filter.py
after I've installed bitshuffle with the new ZSTD extension:
[willis@magenta bitshuffle]$ python bitshuffle/tests/test_h5filter.py
Traceback (most recent call last):
File "bitshuffle/tests/test_h5filter.py", line 12, in <module>
from bitshuffle import h5
File "/usr/local/lib/python2.7/dist-packages/bitshuffle/__init__.py", line 22, in <module>
from bitshuffle.ext import (__version__, bitshuffle, bitunshuffle, using_NEON, using_SSE2,
ImportError: /usr/local/lib/python2.7/dist-packages/bitshuffle/ext.so: undefined symbol: ZSTD_compressBlock_fast
from this it looks like the ZSTD_compressBlock_fast
hasn't been compiled into the source. So I added the source file to setup.py
and reinstalled bitshuffle:
+zstd_sources = ["zstd/compress/zstd_compress.c",
+ "zstd/compress/zstd_fast.c",
+ "zstd/compress/zstd_double_fast.c",
+ "zstd/decompress/zstd_decompress.c"
+ ]
+
+zstd_headers = ["zstd/zstd.h",
+ "zstd/compress/zstd_fast.h"
+ ]
ext_bshuf = Extension(
"bitshuffle.ext",
sources=["bitshuffle/ext.pyx", "src/bitshuffle.c",
"src/bitshuffle_core.c", "src/iochain.c",
- "lz4/lz4.c", "zstd/compress/zstd_compress.c",
- "zstd/decompress/zstd_decompress.c"],
+ "lz4/lz4.c"] + zstd_sources,
include_dirs=["src/", "lz4/", "zstd/"],
depends=["src/bitshuffle.h", "src/bitshuffle_core.h",
- "src/iochain.h", "lz4/lz4.h", "zstd/zstd.h"],
+ "src/iochain.h", "lz4/lz4.h"] + zstd_headers,
libraries=[],
define_macros=MACROS,
)
@@ -124,12 +134,10 @@ h5filter = Extension(
"bitshuffle.h5",
sources=["bitshuffle/h5.pyx", "src/bshuf_h5filter.c",
"src/bitshuffle.c", "src/bitshuffle_core.c",
- "src/iochain.c", "lz4/lz4.c",
- "zstd/compress/zstd_compress.c",
- "zstd/decompress/zstd_decompress.c"],
+ "src/iochain.c", "lz4/lz4.c"] + zstd_sources,
depends=["src/bitshuffle.h", "src/bitshuffle_core.h",
"src/iochain.h", "src/bshuf_h5filter.h",
- "lz4/lz4.h", "zstd/zstd.h"],
+ "lz4/lz4.h"] + zstd_headers,
define_macros=MACROS,
**pkgconfig("hdf5", config=dict(
include_dirs=["src/", "lz4/", "zstd/"]))
@@ -139,12 +147,10 @@ filter_plugin = Extension(
"bitshuffle.plugin.libh5bshuf",
sources=["src/bshuf_h5plugin.c", "src/bshuf_h5filter.c",
"src/bitshuffle.c", "src/bitshuffle_core.c",
- "src/iochain.c", "lz4/lz4.c",
- "zstd/compress/zstd_compress.c",
- "zstd/decompress/zstd_decompress.c"],
+ "src/iochain.c", "lz4/lz4.c"] + zstd_sources,
depends=["src/bitshuffle.h", "src/bitshuffle_core.h",
"src/iochain.h", 'src/bshuf_h5filter.h',
- "lz4/lz4.h", "zstd/zstd.h"],
+ "lz4/lz4.h"] + zstd_headers,
but I run into the same problem again. Could it actually be that bitshuffle is not installed if there are no changes to source files?
I use this command: sudo python setup.py --verbose install --h5plugin --h5plugin-dir=/usr/local/hdf5/lib/plugin
Oh I pushed those changes to the setup.py
including the extra source files, now travis gets further and fails on:
nosetests -v bitshuffle
Failure: ImportError (/home/travis/virtualenv/python3.6.7/lib/python3.6/site-packages/bitshuffle/ext.cpython-36m-x86_64-linux-gnu.so: undefined symbol: ZSTD_compressBlock_greedy) ... ERROR
so it must be a problem with my local python setup.
@jrs65 @kiyo-masui, I think this is ready to be looked at now. I have added ZSTD to the unit tests and fixed an issue with the __version__
string. I apologise for the 80 new files... but they are mostly from the ZSTD library. I have trimmed that down to common/
, compress/
and decompress/
folders which is all we need from it.
The tests pass, but I will need to work with Arash to make sure everything is okay. I will look into adding ZSTD into kotekan also.
Please undelete the old regression test data and make sure the current tests still test against that data. The whole idea of the regression tests is to make sure newer versions of bit shuffle can still decompress data created with previous versions.
Also, instead of having separate files for regression testing lz4 and zst, you should just add another dataset to the 0.3.6 regression test data that is 1st compressed.
Could you also augment the README to include the new functionality?
@james-s-willis would it be worth just putting the zstd stuff into a git submodule? You could also consider making it optional if it really kills the build times.
does it kill the build times?
I've updated the tests from your comments, @kiyo-masui and updated the README.
The build now takes 1m41.844s
with ZSTD.
Also, how do I update the CI config? It fails with a missing python package (packaging
), which I added for the regression test.
@james-s-willis as for the CI build I think the best thing would be to try and get #93 merged first.
@jrs65 I've merged the github actions changes into this branch now and got it to pass the CI. Is this good to merge now?
@james-s-willis can you investigate whether using a git submodule for zstd makes sense? I think when this started I didn't appreciate that zstd was going to add 70 files. Otherwise, I'll have a look through the changes now.
I've created another branch: https://github.com/kiyo-masui/bitshuffle/tree/jsw/zstd-sub that adds ZSTD
as a submodule. It was simple enough the only extra step you need is to run: git submodule update --init
if you're installing with ZSTD
support. It reduces the total no. of extra files, but you get the whole of the ZSTD
repo when you initialise the submodule. In the other branch I had only picked ZSTD
files that we were using.
@james-s-willis which do you think is the right thing to do? Also @kiyo-masui maybe you care here.
I'll defer to the judgement of you two on this one.
I guess when uploading to Pypi, you would include the ZSTD files?
@jrs65 I think we should just stick with this branch. As we've trimmed zstd to what we need and it is fixed at v1.4.9, so we know it works. Unless we want the latest version of zstd?
@james-s-willis what's the difference in file numbers? If it's not that many, on reflection I think the submodule is the better option. It makes it far easier to update to revised versions of zstd in the future.
I've added support for ZSTD. I think I have included changes in the correct place. Could you take a look and let me know?
Also, for testing it I used
bitshuffle/tests/test_h5filter.py
andfilter_opts=((0, 3),),
where3
should be the ZSTD compression. Is that the best way to test this kind of change?