openPMD / openPMD-api

:floppy_disk: C++ & Python API for Scientific I/O
https://openpmd-api.readthedocs.io
GNU Lesser General Public License v3.0
134 stars 51 forks source link

HDF5: Handle unknown datatypes in datasets #1469

Closed franzpoeschel closed 11 months ago

franzpoeschel commented 1 year ago

First commit: Throw error::ReadError in HDF5IOHandlerImpl::openDataset(). Upon encountering an error, the middle-end will know that something has gone wrong that it can then recover from, skipping the dataset.

Second commit: Sometimes, datasets use custom datatypes based on a native type. This can be supported by checking the parent datatypes if the actual datatypes is not recognized. See here for an example that uses enums to emulate booleans.

ax3l commented 1 year ago

Thanks for the patch!

This seems to fail on 32bit Windows right now @franzpoeschel

ax3l commented 1 year ago

Does this need test coverage? :)

franzpoeschel commented 1 year ago

The Windows tests look like a random failure. EDIT: Yep, after restarting the CI, it runs fine. Testing this is a bit difficult since the datasets that are fixed by this cannot be created by the openPMD-api, so we would need to first manually create one and then add it to the sample datasets. Might still be worth it as we could use this to ensure read compatibility with old PIConGPU HDF5 files which we apparently did not so far.

franzpoeschel commented 1 year ago

For testing, we'll need https://github.com/openPMD/openPMD-example-datasets/pull/20