nexusformat / definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
26 stars 56 forks source link

Validity of NXlink target with multiple entries/subentries #938

Open pascaldreher opened 3 years ago

pascaldreher commented 3 years ago

According to NeXus manual section 1.2.3.1 it is valid to have multiple NXentry groups in a single NeXus file, and each entry might correspond to a specific application definiton. Application definitions (and base classes) might include link tags with specified target attributes.

According to the NXDL definition of linkType the target attributes need to provide absolute paths. Having multiple NXentry groups contradicts the requirement for absolute paths, simply because an NXentry can't know its name at definition time: If one would specify the NXentry@name attribute at definition time it would be impossible to have the same application definition multiple times in a file. Ultimately, providing absolute paths for the target attributes would require knowing the name of the NXentry at definition time, which is impossible if one would have multiple NXentry in one file.

Now some application definitions also use target attributes, where some of the path fragments correspond to the respective group@type instead of group@name. However, I could not find a place in the NeXus manual, where this is (a) definied as valid usage if the target attribute and (b) where it is defined how such paths are to be resolved. For example, does the NXentry in a path like /NXentry/some_group need to be resolved as the parent NXentry of some_group? Is there a general grammar on how to resolve target paths?

Such behavior as in the last paragraph would circumvent the contradiction from above by introducing some sort of relative paths. However, it would be nice if this is documented somewhere as intended behavior.

The situation gets more complicated with NXsubentry groups. Imagine one has application definitions NXapp1 and NXapp2. At some point it seems to be easier to measure NXapp1 and NXapp2 simultaneously. To encourage code-reusability, i.e. code saving the files, analysis code, etc., and also to encourage reusability of NeXus application definitions, it would seem natural to use our application definitions for different NXsubentry groups in the same NXentry group, as this would only require minor modifications of existing code. Woud such a usage be valid according to the NeXus specifications? Moreover, if our application definitions include link tags, how would one resolve the target paths? If the path-relativity from above is used, would NXentry in a path be resolved as the NXentry one traverses if one ascends the tree from the link tag to the definition tag?

mkoennecke commented 3 years ago

Please see: https://github.com/nexusformat/NIAC/issues/77#issuecomment-716643766 The NIAC has relaxed the rules regarding links and link targets in the 2020 NIAC. Does this solve your issue? May be this is only an issue of updating the manual?

cnxvalidate, our validation tool, has been updated to only check that a link is pointing to something valid.

rayosborn commented 3 years ago

One issue with relative links is that HDF5 does not recognize paths containing .. so we would not be able to map links onto any HDF5 link mechanism. I'm also not entirely sure why it's impossible to assume that the file path cannot be determined at run-time. I presume your scenario is to have a function that creates a NXentry tree and then adds it to the root. However, couldn't the path to the NXentry group be a function parameter so that absolute file paths could still be constructed at run-time?

pascaldreher commented 3 years ago

Thank you for you quick response.

Please see: nexusformat/NIAC#77 (comment) The NIAC has relaxed the rules regarding links and link targets in the 2020 NIAC. Does this solve your issue? May be this is only an issue of updating the manual?

This only partially solves my issue. But maybe I am wrongly interpreting NeXus. I thought the purpose of NeXus is to standardize data exchange by specifying application definitions using NXDL. Thus, correct me if I'm wrong, defintion of links should already happen at definition time, i.e. in the respective XML file. To definite a link target in an application definition one would need to know the correct absolute path. But shouldn't this be separated from the file representation, i.e. HDF5, of the data? While in HDF5 of course the path needs to absolute, for the application definition it would be totally fine to have relative paths. This would also enable to reuse application definitions as subentries or have multiple entries in one file without invalidating the absolute target paths in the application definition.

One issue with relative links is that HDF5 does not recognize paths containing .. so we would not be able to map links onto any HDF5 link mechanism. I'm also not entirely sure why it's impossible to assume that the file path cannot be determined at run-time. I presume your scenario is to have a function that creates a NXentry tree and then adds it to the root. However, couldn't the path to the NXentry group be a function parameter so that absolute file paths could still be constructed at run-time?

We are trying to (a) separate the definition of the data structure, i.e. xml definitions, from writing the data as NeXus (HDF5) files, and (b) keep everything flexible and reconfigurable. In our experiemt (ultrafast photoemission microscopy and spectroscopy), we have multiple routine measurement types, which however involve different optics components/parameters/etc. Thus, we try to build our NeXus XML definitions from several "snippets" for all the components involved in our experiments. We do, however, not build the xml files at runtime, but build them using xml includes, as this is independet of the software which in the end writes the measurements to file. In this case we need to make sure that link targets remain valid, even if there are multiple (sub-)entires in a single file and if components might end up in a different place. Again, specifying the paths as relative paths in the NeXus NXDL xml files would help here. This would not require relative paths in the HDF5 file.

rayosborn commented 3 years ago

So are you only asking how to define a link target in the NXDL file for an application definition? I wasn't involved in defining the NXDL, so those who understand schema better than me should comment, but I would have thought that specifying the target as, e.g., NXentry/group/field should be allowed. In the nexusformat Python API, each field or group has an nxentry property so that field.nxentry would resolve to the parent NXentry (or NXsubentry). Therefore, if you were to write a Python script to produce a NeXus file based on a particular NXDL, it would have no problem converting a target like NXentry/group/field to an absolute path when writing the file. This would work even if the NXentry placeholder is actually a NXsubentry group.

Have I understood your issue correctly? Others would have to say if this is currently allowed, but if it's not, I would support changing the rules.

pascaldreher commented 3 years ago

You are right, I am wondering how one can properly define links in NXDL-XML files, and I think you are understanding my issue correctly @rayosborn. The problem is not that links such as NXentry/group/field aren't allowed. The problem is that such links are ambiguous, when there is more than one NXentry in a single NeXus file. In this case, should the NXentry in NXentry/group/field be interpreted as the parent NXentry the `link' is a descendent of?

More involved is the problem of reusing NXDL application definitions as subentries. Imagine I have an NXDL file, let's call it definition_a.xml, that looks like this

<?xml version='1.0' encoding='UTF-8'?>
<definition>
    <group type="NXentry">
        <field name="values" type="NX_CHAR">
            test
        </field>
        <link name="values_link" target="/NXentry/values"/>
    </group>
</definition>

The first problem arises if one would have multiple entries of this form in a sinle NeXus file as mentioned above.

The second problem arises if one would like to reuse the NXDL file from above as an subentry. Imagine at some point it makes sense to measure data according to definition_a.xml in parallel to data according to a NXDL file definition_b.xml (let's assume definition_b.xml has a structure that is similar to definition_a.xml). It would be inconvenient if from now on one would need to keep 3 definitions up-to-date (a, b, and the a+b combination). Instead we would like to encourage code reusability and especially the reuasibility of the different NXDL files. One option to do so using plain XML would be using XInclude, which is supported by W3C via a W3C Recommendation. A possible combined NXDL file could look like this

<definition xmlns:xi="http://www.w3.org/2001/XInclude">
    <group type="NXentry">
        <group name="method_a" type="Nxsubentry">
            <xi:include href="definition_a.xml" xpointer="xpointer(/definition/group[type='NXentry']/*)"/>
        </group>
        <group name="method_b" type="Nxsubentry">
            <xi:include href="definition_b.xml" xpointer="xpointer(/definition/group[type='NXentry']/*)"/>
        </group>
    </group>
</definition>

However, in our combined definition all links defined in definition_a.xml and definition_b.xml would be invalid, as the absolute paths do not exist anymore. Instead the paths would now need to be relative to the respective ´NXsubentry´ groups, i.e. the target /NXentry/values would now need to be /NXentry/method_a/values.

There are two solutions to this problem: (a) interpret the NXentry in /NXentry/values as "the NeXus path to the NXentry or NXsubentry the link is a descendent of". This is the behavior nexusformat Python API has implemented, right? Also, this implies that NXentry and NXsubentry have somewhat the same meaning in a link target path definition. (b) have some sort of relative paths, which NXDL does not permit (yet).

For now I came up with my own solution, namely having an additional XML attribute "relative" for every link and have a preprocessor expand every "relative" link target up to the correct NXentry like in solution (a). Still, this seems like a needlessly complicated solution and it would be nice to have some sort of relative path mechanics at least on the NXDL level.

tl;dr: Right now link definitions in NXDL files can become ambiguous or even invalid when multiple entries are envisoned in a single NeXus file. The situation is worse when NXDL application defintions might be reused for NXsubentry groups, as then the link target paths in the NXDL file inevitably get invalidated. NXDL could improve code-reusability by introducing relative paths for link definitions in NXDL. This of course does not imply relative links in HDF5, which is not supported and also not necessary to solve the problem.

prjemian commented 3 years ago

These are all relative to the parent NXentry group. The spec is purposely vague about exact names to make naming more flexible.

On Tue, Aug 10, 2021, 7:38 AM pascaldreher @.***> wrote:

You are right, I am wondering how one can properly define links in NXDL-XML files, and I think you are understanding my issue correctly @rayosborn https://github.com/rayosborn. The problem is not that links such as NXentry/group/field aren't allowed. The problem is that such links are ambiguous, when there is more than one NXentry in a single NeXus file. In this case, should the NXentry in NXentry/group/field be interpreted as the parent NXentry the `link' is a descendent of?

More involved is the problem of reusing NXDL application definitions as subentries. Imagine I have an NXDL file, let's call it definition_a.xml, that looks like this

<?xml version='1.0' encoding='UTF-8'?>

test

The first problem arises if one would have multiple entries of this form in a sinle NeXus file as mentioned above.

The second problem arises if one would like to reuse the NXDL file from above as an subentry. Imagine at some point it makes sense to measure data according to definition_a.xml in parallel to data according to a NXDL file definition_b.xml (let's assume definition_b.xml has a structure that is similar to definition_a.xml). It would be inconvenient if from now on one would need to keep 3 definitions up-to-date (a, b, and the a+b combination). Instead we would like to encourage code reusability and especially the reuasibility of the different NXDL files. One option to do so using plain XML would be using XInclude, which is supported by W3C via a W3C Recommendation. A possible combined NXDL file could look like this

However, in our combined definition all links defined in definition_a.xml and definition_b.xml would be invalid, as the absolute paths do not exist anymore. Instead the paths would now need to be relative to the respective ´NXsubentry´ groups, i.e. the target /NXentry/values would now need to be /NXentry/method_a/values.

There are two solutions to this problem: (a) interpret the NXentry in /NXentry/values as "the NeXus path to the NXentry or NXsubentry the link is a descendent of. This is the behavior nexusformat Python API has implemented, right? (b) have some sort of relative paths, which NXDL does not permit (yet).

For now I came up with my own solution, namely having an additional XML attribute "relative" for every link and have a preprocessor expand every "relative" link target up to the correct NXentry like in solution (a). Still, this seems like a needlessly complicated solution and it would be nice to have some sort of relative path mechanics at least on the NXDL level.

tl;dr: Right now link definitions in NXDL files can become ambiguous or even invalid when multiple entries are envisoned in a single NeXus file. The situation is worse when NXDL application defintions might be reused for NXsubentry groups, as then the link target paths in the NXDL file inevitably get invalidated. NXDL could improve code-reusability by introducing relative paths for link definitions in NXDL. This of course does not imply relative links in HDF5, which is not supported and also not necessary to solve the problem.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/938#issuecomment-896036645, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARMUMD3UFO5ZFO2KPJFDLLT4ETWBANCNFSM5BVTTPYQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

mkoennecke commented 3 years ago

I agree that we are a bit loose about the link definitions in NXDL files. But there is a reason for this. It is a perfectly legal use case for NeXus to store your data in a chaotic and wild way in a HDF-5 file. Then proceed and create a NXentry or a NXsubentry conforming to a NeXus application definition with all the data items being links somewhere in the file. In order to do this you of course need to know where you put the stuff. Thus the link definitions in NXDL need to be read more as hints what the linked data item should point to.

rayosborn commented 3 years ago

@prjemian, I think one of the things @pascaldreher is pointing out is that the links need to be relative to either a parent NXentry group or a parent NXsubentry group. Restricting relative paths to a parent NXentry group would break any relative links in application definitions that are implemented as NXsubentry groups.

pascaldreher commented 3 years ago

@rayosborn this is exactly the point I am trying to make. Also an NXsubentry fragment should be allowed to also point to a parent NXentry if there is no parent Nxsubentry. This would enable to use the same application definition as an NXentry and as an Nxsubentry.

It would also be beneficial if every NX... fragment at the beginning of a path could point to a respective parent group. However, this might be a case which is specific to our domain of applications and not interesting for the NIAC. Right now I solve our problem by having an additional attribute for every link, which indicates that the target of the link should be resolved up to an appropriate parent group by a preprocessor.

In the end, this would boil down to having some sort of relative path mechanics on the NXDL level, which I don't know if the NIAC is open to.

rayosborn commented 3 years ago

@pascaldreher, when you raised this issue, I checked what the nexusformat Python API behavior was, and it did exactly what you are suggesting. It worked because the NXsubentry class is a sub-class of NXentry, so the function that points to a parent NXentry group resolves to the first NXsubentry group above it in the hierarchy. The Python API makes use of inheritance in a number of ways, but technically, there is no inheritance in the NeXus format itself, so we just have to say it in the text. I have no idea if schema can be written in an object-oriented way.

This should be discussed by NIAC. Thanks for raising the issue.