Open Dalbasar opened 3 weeks ago
Hi, I have similar performance difference on Linux:
hdf5plugin 4.0.1: lz4 compression time 0.484s, lz4 decompression_time: 0.792s hdf5plugin 5.0.0: lz4 compression time 0.478s, lz4 decompression_time: 2.025s
However when using random data (test_data = np.random.randint(0, 255, size=(1024, 1024, 1024), dtype=np.uint8)
), the difference disappears:
hdf5plugin 4.0.1: lz4 compression time 0.843s, lz4 decompression_time: 0.726s hdf5plugin 5.0.0: lz4 compression time 0.808s, lz4 decompression_time: 0.683s
It sounds to be a LZ4 library-related problem and it would be interesting to try to reproduce it directly with the LZ4 library,
The LZ4 HDF5 compression filter does not use LZ4 multithreading: it calls deprecated LZ4 API (e.g., LZ4_decompress_fast) which does not seem to allow enabling multithreading.
I noticed that the LZ4 decompression via hdf5plugin 4.1.0 and later is 5-6x slower than with hdfplugin 4.0.1, while the compression speed is very similar:
gives the following results for different hdf5plugin version with h5py 3.12.1 on Python 3.11.9 on Windows 10 (AMD Ryzen 7 5900X):
I have seen similar results on Python 3.8 and 3.11 on Debian 12 with different h5py versions.
I would have expected a substantial speedup after updating to version 5.0 with update to lz4 1.10 with the new multithreaded decompression compared to 4.1.x, but the decompression speed for 4.1.x and 5.0 seems to be the same and not using multithreaded lz4 decompression.