Closed g5t closed 4 months ago
I failed to attach this file previously: monitor.h5.zip
:warning: Other unresolved issues prevent
scippnexus
from loading the group at/entry/instrument/007_frame_0
Where this becomes ambiguous, is in that HDF5 allows for array-valued attributes (which h5py reads as numpy arrays), and
scippnexus
has interpreted both 'colon-delimited array' and 'actual array of strings' to specify a single string with axes names separated by literal ':' characters.
That is not correct. Scipp implements both, see the comments in the relevant code: https://github.com/scipp/scippnexus/blob/58ffbbe67e90bf3341a7589e89e00e36859ba712/src/scippnexus/nxdata.py#L153-L164
NXdata@axes
is an "actual array of strings" (as the standard says), and the older NXdata/DATA@axes is a "colon-delimited array".
Issue
If someone interprets 'actual array of strings' to mean 'array-valued attribute with string elements', then creates a NeXus HDF5 file, like the one attached; that file can not be opened by
scippnexus
due to the use of thesplit
method here:
The self._signal_name
corresponds to the NXdata/DATA@axes
attr, not the NXdata@axes
attr. The former is a legacy way of defining axes, and I am not sure it is worth adding workarounds for broken legacy files?
Add a check that
self._signal_axes
is astr
before attempting to use.split
, or alternately a check if it is a numpy array, e.g., replacing the quoted line 160 byself._signal_axes = tuple(self._signal_axes if isinstance(self._signal_axes, np.ndarray) else self._signal_axes.split(':'))
If we fail to load (instead of using a fallback) then I agree we should improve that. However I am hesitant to interpret such a non-standard attribute. Where do we draw the line?
As a workaround for loading such files, note that it is possible to pass customized application definitions to snx.File
. One could, e.g., remove the NXdata
from the defs, so it will be loaded as a plain group:
import scippnexus as snx
my_definitions = snx.base_definitions()
del my_definitions['NXdata']
with snx.File(filename, definitions=my_definitions) as f:
dg = f[()]
I see now that only NXdata@axes
is recommended -- but why this information is not also listed on the NXdata
page is a mystery.
My issue seems to stem from having a NXdata
group nested inside of a NXmonitor
, and the internal group getting picked-up as the monitor's self._signal_axes
.
Is it not valid to replace NXmonitor/data
(an optional NX_NUMBER
by a more-complete
NXdata` group?
How
@axes
should be encoded/interpreted has been discussed previously in #145@axes
applied to thedataset
named 'DATA'Comparing what the NeXus standard says regarding how the
@axes
attribute should be encoded when applied to the 'DATA'dataset
and its containingNXdata
group, there is some ambiguity.Namely, notice that both use the term 'array'
Where this becomes ambiguous, is in that HDF5 allows for array-valued attributes (which h5py reads as numpy arrays), and
scippnexus
has interpreted both 'colon-delimited array' and 'actual array of strings' to specify a single string with axes names separated by literal ':' characters.Issue
If someone interprets 'actual array of strings' to mean 'array-valued attribute with string elements', then creates a NeXus HDF5 file, like the one attached; that file can not be opened by
scippnexus
due to the use of thesplit
method here: https://github.com/scipp/scippnexus/blob/58ffbbe67e90bf3341a7589e89e00e36859ba712/src/scippnexus/nxdata.py#L160Possible solution
Add a check that
self._signal_axes
is astr
before attempting to use.split
, or alternately a check if it is a numpy array, e.g., replacing the quoted line 160 byAlternate solution
If the interpretation of 'actual array of strings' should be limited to 'colon-delimited array' like the 'DATA' dataset attribute, petition the NeXus committee to clarify their boxed-note for the group attribute.