silx-kit / hdf5plugin

Set of compression filters for h5py
http://www.silx.org/doc/hdf5plugin/latest/
Other
62 stars 22 forks source link

SZ compressor with absolute mode. #267

Closed orioltinto closed 1 year ago

orioltinto commented 1 year ago

I'm having a problem with the SZ compressor with the absolute mode. I opened an issue at the SZ github page but after digging a lot my best guess is that the problem must be related on how the plugin is built in hdf5plugins.

A script to reproduce the error:

import tempfile

import h5py
import hdf5plugin
import numpy as np

# Some parameters
SHAPE = (1000, 25, 25)
TOLERANCE = 0.01

# Generate random data
np.random.seed(0)
data = np.random.random(size=SHAPE).astype(np.float32)

encoding = hdf5plugin.SZ(absolute=TOLERANCE)
# Add chunking info
encoding = {**encoding, "chunks": data.shape}

with tempfile.NamedTemporaryFile() as tmp_file:
    # Create compressed file
    with h5py.File(tmp_file.name, 'w') as f:
        f.create_dataset('var', data=data, **encoding)

    # Open compressed file
    with h5py.File(tmp_file.name, 'r') as f:
        recovered_data = f["var"][:]

# Check that the data fulfills the constrain
if not np.allclose(data, recovered_data, atol=TOLERANCE):
    max_diff = np.max(np.abs(recovered_data - data))
    raise AssertionError(f"Condition not fulfilled for {TOLERANCE=} -> {max_diff=}")

The error can be reproduced in a Docker container with python:3.11 . I also encountered the problem with older versions.

FROM python:3.11
RUN pip install hdf5plugin
vasole commented 1 year ago

The best thing to do would be to use the command line tool of SZ and to see if the generated file satisfies the requested conditions.

We might start looking at issues at the hdf5plugin side when the problem could be somewhere else...

It seems already done

vasole commented 1 year ago

@orioltinto

Can you drop the library filter you compiled directly (https://github.com/szcompressor/SZ/issues/108#issuecomment-1593314912) at the place of the one generated with hdf5plugin (or set the HDF5_PLUGIN_PATH to the location of your filter) and then run your python script?

That might narrow the search.

orioltinto commented 1 year ago

I tried both things:

In both cases the code works. I can only reproduce the error with the version built with hdf5plugin.

A Dockerfile to check the tree cases:

FROM ubuntu:rolling
RUN export DEBIAN_FRONTEND=noninteractive \
    && apt update \
    && apt install -yq vim make cmake wget git python3 python3-pip python3-venv \
    && apt install -yq swig gcc gfortran pkg-config libzstd-dev \
    && apt install -yq libhdf5-dev hdf5-tools
RUN git clone https://github.com/szcompressor/SZ.git --depth 1
RUN cd SZ; mkdir build ; cd build; cmake .. -DBUILD_HDF5_FILTER:BOOL=ON ; make ; make install
RUN pip install h5py  --no-binary h5py --break-system-packages
RUN pip install hdf5plugin --break-system-packages
COPY code.py .
CMD echo "_____Setting HDF5_PLUGIN_PATH__________________" ; \
    HDF5_PLUGIN_PATH=/SZ/build/hdf5-filter/H5Z-SZ/ python3 code.py ; \
    echo "_____Using hdf5plugin _________________________" ; \
    python3 code.py ;\
    echo "_____Replacing filter in hdf5plugin____________" ; \
    cp SZ/build/hdf5-filter/H5Z-SZ/libhdf5sz.so /usr/local/lib/python3.11/dist-packages/hdf5plugin/plugins/libh5sz.so ; \
    python3 code.py
t20100 commented 1 year ago

Hi,

Thanks for reporting the issue and providing a way to reproduce the issue! Really appreciated. PR #268 should fix this by changing the compilation flags used for the SZ compression filter.

orioltinto commented 1 year ago

Great, I could verify that the results look good now. Thanks!