hdmf-dev / hdmf

The Hierarchical Data Modeling Framework
http://hdmf.readthedocs.io
Other
48 stars 26 forks source link

[Feature]: Add official optional support for additional HDF5 filters #729

Open rly opened 2 years ago

rly commented 2 years ago

What would you like to see added to HDMF?

As discussed over Zoom, now that MatNWB can use HDF5 filters from https://pypi.org/project/hdf5plugin/ beyond gzip, we want to add official optional support for these filters in HDMF and PyNWB.

Is your feature request related to a problem?

No response

What solution would you like?

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

oruebel commented 2 years ago

Another nice-to-have feature that would be useful for this is to have a way for users to get a listing of available filters. Currently we have H5DataIO.filter_available to check if a given filter is available. It would be nice to have something like H5DataIO.available_filters to retrieve information about which filters are available.

bendichter commented 2 years ago

Yes, or at least a tutorial that demonstrates how to get a list of dynamically loaded filters using hdf5plugin

oruebel commented 2 years ago

I have not found a way yet to get all available filters for HDF5. There must be (or at least should be) some way to get a listing of installed filters, but I have not been able to find it yet.

I know that the filters have ids of 0 -255 for filters shipped with the HDF5 library and custom filters have values ids of 512 - 65535 (described here and official registry for HDF5 filters is I believe here ).

hdf5plugin.FILTERS only has the list of filters that ship with hdf5plugin but does not include other filters (e.g., the ones that ship with h5py or other filters a user may have installed by other means).

t-b commented 2 years ago

From what I understand you have to check all possible ids if there is a filter available for it or not. At least https://docs.hdfgroup.org/hdf5/develop/group___h5_z.html#ga3594e10d70739ccda55ebb55b17b50ee suggests that.

mavaylon1 commented 7 months ago

@rly do you want to take this? If not, fill me in and I can take it.

rly commented 7 months ago

Currently if the filters from pypi.org/project/hdf5plugin are installed, then they can be used as compression options. However, we do not have tests or documentation of this feature. That would be good to have. It would be great if you could take this one.

oruebel commented 7 months ago

However, we do not have tests or documentation of this feature.

In PyNWB this is briefly discussed in the docs here https://pynwb.readthedocs.io/en/stable/tutorials/advanced_io/h5dataio.html#dynamically-loaded-filters

I don't think there is anything in the HDMF docs.

oruebel commented 7 months ago

However, we do not have tests

I don't think we have tests for hdf5plugin . The main thing that is being test I believe is: 1) is it possible to pass a custom filter specified via its int id

https://github.com/hdmf-dev/hdmf/blob/6d0be176d1a16ace594410404753aa9d69d4a2ed/tests/unit/test_io_hdf5_h5tools.py#L216-L227

and 2) do we detect unsupported filters and can we explicitly set a filter:

https://github.com/hdmf-dev/hdmf/blob/6d0be176d1a16ace594410404753aa9d69d4a2ed/tests/unit/test_io_hdf5_h5tools.py#L606-L641

However, these tests are using only filters that ship with h5py as part of from h5py import filters.

As part of adding tests, I think we should try to keep hdf5plugins as an optional dependency, i.e., skip the tests if hdf5plugins is not installed.