silx-kit / hdf5plugin

Set of compression filters for h5py
http://www.silx.org/doc/hdf5plugin/latest/
Other
66 stars 25 forks source link

Performance regression for lz4 decompression after version 4.0.1 #326

Open Dalbasar opened 3 weeks ago

Dalbasar commented 3 weeks ago

I noticed that the LZ4 decompression via hdf5plugin 4.1.0 and later is 5-6x slower than with hdfplugin 4.0.1, while the compression speed is very similar:

import time
import hdf5plugin
import h5py
import numpy as np
from io import BytesIO

test_data = np.ones((1024, 1024, 1024), np.uint8)

raw_buffer = BytesIO()

with h5py.File(raw_buffer, 'w') as f:
    compression_start_time = time.perf_counter()
    f.create_dataset('data', data=test_data, compression=hdf5plugin.LZ4())
    compression_time = time.perf_counter() - compression_start_time

with h5py.File(raw_buffer, 'r') as f:
    decompression_start_time = time.perf_counter()
    data = f['data'][:]
    decompression_time = time.perf_counter() - decompression_start_time

print(f"hdf5plugin {hdf5plugin.version}: "
      f"lz4 compression time {compression_time:.3f}s, "
      f"lz4 decompression_time: {decompression_time:.3f}s")

gives the following results for different hdf5plugin version with h5py 3.12.1 on Python 3.11.9 on Windows 10 (AMD Ryzen 7 5900X):

hdf5plugin 4.0.1: lz4 compression time 0.219s, lz4 decompression_time: 0.283s hdf5plugin 4.1.0: lz4 compression time 0.226s, lz4 decompression_time: 1.630s hdf5plugin 5.0.0: lz4 compression time 0.221s, lz4 decompression_time: 1.610s

I have seen similar results on Python 3.8 and 3.11 on Debian 12 with different h5py versions.

I would have expected a substantial speedup after updating to version 5.0 with update to lz4 1.10 with the new multithreaded decompression compared to 4.1.x, but the decompression speed for 4.1.x and 5.0 seems to be the same and not using multithreaded lz4 decompression.

t20100 commented 2 weeks ago

Hi, I have similar performance difference on Linux:

hdf5plugin 4.0.1: lz4 compression time 0.484s, lz4 decompression_time: 0.792s hdf5plugin 5.0.0: lz4 compression time 0.478s, lz4 decompression_time: 2.025s

However when using random data (test_data = np.random.randint(0, 255, size=(1024, 1024, 1024), dtype=np.uint8)), the difference disappears:

hdf5plugin 4.0.1: lz4 compression time 0.843s, lz4 decompression_time: 0.726s hdf5plugin 5.0.0: lz4 compression time 0.808s, lz4 decompression_time: 0.683s

It sounds to be a LZ4 library-related problem and it would be interesting to try to reproduce it directly with the LZ4 library,

The LZ4 HDF5 compression filter does not use LZ4 multithreading: it calls deprecated LZ4 API (e.g., LZ4_decompress_fast) which does not seem to allow enabling multithreading.