prjemian / punx

Python Utilities for NeXus HDF5 files
https://prjemian.github.io/punx
5 stars 7 forks source link

validate: mismatch between NXdata@axes and available fields #219

Closed prjemian closed 1 year ago

prjemian commented 1 year ago

BTW, I'm surprised that punx validate did not identify the very real problem with this NeXus file:

    data:NXdata
      @NX_class = "NXdata"
      @axes = ["und"]
      @signal = "XPS_sample"
      @target = "/entry/data"
      EPOCH --> /entry/instrument/bluesky/streams/primary/und_readback/time
      M6_collector --> /entry/instrument/bluesky/streams/primary/M6_collector/value
      XPS_pd --> /entry/instrument/bluesky/streams/primary/XPS_pd/value
      XPS_sample --> /entry/instrument/bluesky/streams/primary/XPS_sample/value
      und_readback --> /entry/instrument/bluesky/streams/primary/und_readback/value

In NXdata, the axes attribute value names a field that exists in the group (either as HDF5 dataset or HDF5 link). And that is clearly not true here.

Originally posted by @prjemian in https://github.com/BCDA-APS/apstools/issues/806#issuecomment-1433619315

prjemian commented 1 year ago

This test method:

def test_i219(tempdir):
    h5file = tempdir / "test_file.h5"
    assert not h5file.exists()

    with h5py.File(h5file, "w") as root:
        root.attrs["default"] = "entry"

        nxentry = root.create_group(root.attrs["default"])
        nxentry.attrs["NX_class"] = "NXentry"
        nxentry.attrs["default"] = "data"

        nxdata = nxentry.create_group(nxentry.attrs["default"])
        nxdata.attrs["NX_class"] = "NXdata"

        # these match
        nxdata.attrs["signal"] = "XPS_sample"
        nxdata.create_dataset("XPS_sample", data=[1, 2, 3])
        assert nxdata.attrs["signal"] in nxdata

        # these do not match
        nxdata.attrs["axes"] = ["und"]
        nxdata.create_dataset("und_readback", data=[3, 4, 1])
        for k in nxdata.attrs["axes"]:
            assert k not in nxdata

    assert h5file.exists()

    validator = validate.Data_File_Validator()
    assert isinstance(validator, validate.Data_File_Validator)

    validator.validate(h5file)

    average = validator.finding_score()[-1]
    assert average < -10_000

confirms that validate fails to catch this mismatch:

        average = validator.finding_score()[-1]
>       assert average < -10_000
E       assert 98.84415584415585 < -10000

punx/tests/test_i219_nxdata_mismatch.py:60: AssertionError
=========================== short test summary info ============================
FAILED punx/tests/test_i219_nxdata_mismatch.py::test_i219 - assert 98.8441558...
============================== 1 failed in 1.11s ===============================

With any single test failure, average should be a negative number.

prjemian commented 1 year ago

Principal reason is described by this finding from validate:

/entry/data@axes                                                              TODO     attribute value                      implement                                                                            
prjemian commented 1 year ago

Here: https://github.com/prjemian/punx/blob/73161c8c90fbc668b98eedb575279e6308eaf2d9/punx/validations/attribute.py#L69-L80