silx-kit / vscode-h5web

VSCode extension to explore and visualize HDF5 files
https://marketplace.visualstudio.com/items?itemName=h5web.vscode-h5web
MIT License
33 stars 5 forks source link

Bitshuffle-compressed datasets cannot be read when accessed through a virtual dataset #43

Closed loichuder closed 5 months ago

loichuder commented 6 months ago

Describe the bug

Ok, this one is a stretch. Thanks to https://github.com/silx-kit/h5web/pull/1524, it is now possible to read datasets compressed with bitshuffle. But when creating a Virtual dataset pointing such a dataset, I get the following error:

Required filter 'bitshuffle; see https://github.com/kiyo-masui/bitshuffle' is not registered
Full traceback
HDF5-DIAG: Error detected in HDF5 (1.14.2) thread 0:
  #000: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5D.c line 1061 in H5Dread(): can't synchronously read data
    major: Dataset
    minor: Read failed
  #001: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5D.c line 1008 in H5D__read_api_common(): can't read data
    major: Dataset
    minor: Read failed
  #002: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLcallback.c line 2092 in H5VL_dataset_read_direct(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #003: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLcallback.c line 2048 in H5VL__dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #004: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLnative_dataset.c line 363 in H5VL__native_dataset_read(): can't read data
    major: Dataset
    minor: Read failed
  #005: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dio.c line 383 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #006: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dvirtual.c line 2768 in H5D__virtual_read(): unable to read source dataset
    major: Dataset
    minor: Read failed
  #007: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dvirtual.c line 2689 in H5D__virtual_read_one(): can't read source dataset
    major: Dataset
    minor: Read failed
  #008: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dio.c line 383 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #009: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dchunk.c line 2856 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #010: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dchunk.c line 4468 in H5D__chunk_lock(): data pipeline read failed
    major: Dataset
    minor: Filter operation failed
  #011: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Z.c line 1356 in H5Z_pipeline(): required filter 'bitshuffle; see https://github.com/kiyo-masui/bitshuffle' is not registered
    major: Data filters
    minor: Read failed
  #012: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5PLint.c line 267 in H5PL_load(): can't find plugin. Check either HDF5_VOL_CONNECTOR, HDF5_PLUGIN_PATH, default location, or path set by H5PLxxx functions
    major: Plugin for dynamically loaded library
    minor: Object not found
   

To Reproduce

  1. Select 'vds_bug.h5' in VS Code explorer
  2. Click on data_compressed (a 1D dataset compressed with bitshuffle): it displays fine
  3. Click on data_via_vds (a VDS pointing to non-compressed dataset data): it displays fine
  4. Click on data_compressed_via_vds (you get it)
  5. See error

vds_bug.zip

Expected behaviour

It should be able to display compressed datasets, even through a VDS. Interestingly, the h5wasm demo seems to display it fine ?

Context

axelboc commented 5 months ago

Ha, so it works if you first select data_compressed and then data_compressed_via_vds but not if you select data_compressed_via_vds right away.

It's because the virtual compressed dataset's filters metadata doesn't "mirror" the source dataset's filters metadata as it should. So vscode-h5web (or myHDF5) doesn't know that it needs to load the bitshuffle plugin.

I'll report on the h5wasm repo.

image

image

loichuder commented 5 months ago

Ha, so it works if you first select data_compressed and then data_compressed_via_vds but not if you select data_compressed_via_vds right away.

Ha! That's why I thought it worked in the h5wasm demo: it did work because I selected data_compressed first. Just retried: if I select data_compressed_via_vds first, I get the same error I report here.

t20100 commented 5 months ago

It's because the virtual compressed dataset's filters metadata doesn't "mirror" the source dataset's filters metadata as it should.

I'm not sure, you can make a virtual dataset which gives access to multiple datasets stored with different compression filters... (Never seen this though)

loichuder commented 5 months ago

I'm not sure, you can make a virtual dataset which gives access to multiple datasets stored with different compression filters

You can. The following snippet works without trouble:

import numpy
import h5py
import hdf5plugin

with h5py.File("double_filter_vds.h5", "w") as h5file:
    data = numpy.linspace(0, 10, 100)

    c_dset = h5file.create_dataset(
        "bitshuffle", data=data, **hdf5plugin.Bitshuffle(cname="lz4")
    )
    c_dset_2 = h5file.create_dataset("blosc", data=data, **hdf5plugin.Blosc2())

    vlayout = h5py.VirtualLayout(shape=(200,), dtype=dset.dtype)
    vsource = h5py.VirtualSource(dset)
    vlayout[:100] = vsource[:]
    vsource2 = h5py.VirtualSource(dset)
    vlayout[100:] = vsource2[:]

    h5file.create_virtual_dataset("data_via_vds", vlayout)
axelboc commented 5 months ago

I'm not sure, you can make a virtual dataset which gives access to multiple datasets stored with different compression filters... (Never seen this though)

Yep, Brian mentioned this as well: https://github.com/usnistgov/h5wasm/issues/75#issuecomment-2144936429 — he already released a new version of h5wasm that exposes virtual sources in the metadata.

axelboc commented 5 months ago

Should now be fixed in v0.1.6 of the extension.