openPMD-validator fails with the following error message:
Error: Attribute axisLabels in `/data/0/meshes/inv` is not of type ndarray of '<map object at 0x7fe5256acbb0>' (is ndarray of 'object_')!
As variable-length string arrays are a legitimate feature of the HDF5 data format, and the openPMD standard does not explicitly ban using this feature (it only states that axisLabels should be "1-dimensional array containing N (string) elements", which is satisfied in both cases), I believe using variable-length should not violate the openPMD standard, and thus the openPMD-validator should not fail in this case.
This probably happens because internally h5py represents variable-length string arrays as np.ndarray with dtype=object instead of numpy string type (see https://docs.h5py.org/en/stable/special.html). Because of that, instead of using arr.dtype.type (which gives np.object_ for variable-length arrays), the validator should use the h5py.check_string_dtype(arr.dtype) method which correctly works both with fixed- and variable-length string arrays.
Attached are two example output files with fixed- and variable-length used for axisLabels: examples.zip
HDF5 supports two ways of storing an array of strings: fixed-length and variable-length.
openPMD uses arrays of strings for some attributes, for example, for
axisLabels
. When a fixed-length array is used,openPMD-validator considers that a valid attribute. However, when a variable-length array is used,
openPMD-validator fails with the following error message:
As variable-length string arrays are a legitimate feature of the HDF5 data format, and the openPMD standard does not explicitly ban using this feature (it only states that
axisLabels
should be "1-dimensional array containing N (string) elements", which is satisfied in both cases), I believe using variable-length should not violate the openPMD standard, and thus the openPMD-validator should not fail in this case.This probably happens because internally h5py represents variable-length string arrays as
np.ndarray
withdtype=object
instead of numpy string type (see https://docs.h5py.org/en/stable/special.html). Because of that, instead of usingarr.dtype.type
(which givesnp.object_
for variable-length arrays), the validator should use theh5py.check_string_dtype(arr.dtype)
method which correctly works both with fixed- and variable-length string arrays.Attached are two example output files with fixed- and variable-length used for
axisLabels
: examples.zip