nexusformat / definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
26 stars 56 forks source link

Role and structure of root-level NXentry? #1217

Open padraic-shafer opened 1 year ago

padraic-shafer commented 1 year ago

All or nearly all NX applications have a NXentry group at root-level. In the corresponding NXDL file, some of these have an XML attribute called name with the value entry or entry_1. Others instead have a child element of type ~xs:attribute~ nx:attributeType (i.e., an NXDL attribute, as opposed to an XML attribute) that is named entry but with undefined value (to be defined in the data file).

These strike me as inequivalent approaches. Is there guidance or a preferred approach from the NX community? Or am I misunderstanding this?

Examples:

NXentry with XML attribute

<group type="NXentry" name="entry">

NXentry with NXDL attribute, named entry

<group type="NXentry">
    <attribute name="entry">
    <!-- Often a doc string here -->
    </attribute>

EDIT: "doc string", not "doctoring" EDIT: Oops, I think I added to the confusion by writing xs:attribute, where I should have written nx:attributeType; now corrected above.

rayosborn commented 1 year ago

This looks like something that needs fixing. The name of the group is definitely not a group attribute. It is the name of the HDF5 group. In fact, it is the type that is stored as an attribute, but that's a technical issue for those who write using the HDF5 API or h5py. The type attribute is handled automatically by the nexusformat API. Also, the name of the entry is not prescribed. It can be anything that conforms to NeXus naming conventions, although "entry" is commonly used, particularly if there is only one in the file.

prjemian commented 1 year ago

The name of the group is definitely not a group attribute.

Per the nxdl.xsd XML Schema for NeXus NXDL files, the name attribute definitely is an attribute of a NeXus group element in a NXDL file.

prjemian commented 1 year ago

When used, it should be written as:

<group type="NXentry" name="entry">

Other constructs such as the following are incorrect and should be revised:

    <attribute name="entry">
    <!-- Often a doc string here -->
    </attribute>
rayosborn commented 1 year ago

I mean that it is not stored as an attribute of the HDF5 group. If we list it as an attribute, people will be writing h5py code that sets it as an attribute. It looks as if we need to clarify the connection between the NXDL standard and the actual HDF5 files that we write, because this could become a major source of confusion.

prjemian commented 1 year ago

@pshafer-als Thanks for pointing this out! Even the NeXus team can become confused when understanding this point.

padraic-shafer commented 1 year ago

Haha. :)

Thank you both for clarifying the intention here, and for showing me the correct usage.

prjemian commented 1 year ago

@rayosborn - You are describing the appearance of an attribute in an HDF5 file while this post is about an attribute in a NXDL file. These are different cases, underscoring my previous comment.

rayosborn commented 1 year ago

@prjemian, as you say, even old-timers can be confused. @pshafer-als pointed out that the name was sometimes given as an XML attribute of the group, and sometimes as a separate NXDL attribute. Do you believe that the latter is correct in the context of NXDL definitions? If so, we may need to have a way of clarifying when the NXDL attribute translates into an HDF5 attribute and when it doesn't.

prjemian commented 1 year ago

No. Above, I described the correct form.

On Fri, Nov 4, 2022, 12:06 PM Ray Osborn @.***> wrote:

@prjemian https://github.com/prjemian, as you say, even old-timers can be confused. @pshafer-als https://github.com/pshafer-als pointed out that the name was sometimes given as an XML attribute of the group, and sometimes as a separate NXDL attribute. Do you believe that the latter is correct in the context of NXDL definitions? If so, we may need to have a way of clarifying when the NXDL attribute translates into an HDF5 attribute and when it doesn't.

— Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/1217#issuecomment-1303892280, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARMUMGF2DX6CMWRZ27VOOLWGU7BZANCNFSM6AAAAAARWVQ5A4 . You are receiving this because you were mentioned.Message ID: @.***>

prjemian commented 1 year ago

The discussion here deserves a clarifying figure, which after review, may make its way into the document. Soon.

prjemian commented 1 year ago

When the name="entry" attribute is used in a NXDL file, it means the name of the group is required to be exactly entry and nothing different.

<group type="NXentry" name="entry">

In the data file, it has this tree structure:

  entry:
    @NX_class = "NXentry"

To allow some flexibility (since people want to pick their own name for the HDF5 group), NXDL says to specify the name with all capital letters in the NXDL file, indicating that the name is flexible, yet providing a name of reference for further documentation. In this case:

<group type="NXentry" name="ENTRY">

which may shortened to

<group type="NXentry">

The default case for any NeXus group is to allow the name of the group to be flexible. See NXentry for example, these groups all use flexible names: https://github.com/nexusformat/definitions/blob/e9f7406b252a4b209bb50da830e2d27c7f280c9c/base_classes/NXentry.nxdl.xml#L222-L229

Taking the first group in this example, NXuser, the default name for the NXuser group in a HDF5 data file is user. In the documentation, it might be referred to as USER (all caps), or USER:NXuser.

prjemian commented 1 year ago

The second specification above (in a NXDL file) is different in that it adds an entry attribute to the NXentry group for the HDF5 data file. Note that the name of the group is flexible, as described above.:

<group type="NXentry">
    <attribute name="entry">
    <!-- Often a doc string here -->
    </attribute>

This is the HDF5 tree structure:

  entry:  # default name for the group
    @NX_class = "NXentry:
    @entry= "something"  # as described in the documentation

So this case is allowed (*) by NXDL's XML Schema (ndsl.xsd), even though the use of entry is confusing between the name of the group or a special case where entry is an attribute of the specified HDF5 NXentry.


(*) Which explains why this was not caught as an error by the QA process that checks the NXDL files for validity. The QA process uses the same library as the xmllint command line program.


I edited this on re-reading because I, too, made a mistake in my description. Confusing overload of the word name.

prjemian commented 1 year ago

Note regarding flexible name of a group specification in a NXDL file.

A name can be flexible yet non-default when multiple groups of this NX_class are used. See these uses NXdata: https://github.com/nexusformat/definitions/blob/e9f7406b252a4b209bb50da830e2d27c7f280c9c/applications/NXcanSAS.nxdl.xml#L178 https://github.com/nexusformat/definitions/blob/e9f7406b252a4b209bb50da830e2d27c7f280c9c/applications/NXcanSAS.nxdl.xml#L1190

Both use flexible names but the second NXdata group says that TRANSMISSION_SPECTRUM is flexible and not required verbatim. The first relies on the default name="DATA".

prjemian commented 1 year ago

@pshafer-als - Your question opened a new level of inquiry regarding the specification of an exact name for a group. Here are my findings for questionable specifications involving NXdata groups:

(base) prjemian@zap:~/.../NeXus/definitions$ git grep 'type="NXdata"' | grep 'name="data"'
applications/NXfluo.nxdl.xml:      <group type="NXdata" name="data">
applications/NXrefscan.nxdl.xml:      <group type="NXdata" name="data">
applications/NXreftof.nxdl.xml:      <group type="NXdata" name="data">
applications/NXsastof.nxdl.xml:    <group type="NXdata" name="data">
applications/NXspe.nxdl.xml:        <group type="NXdata" name="data">
applications/NXtofnpd.nxdl.xml:      <group type="NXdata" name="data">
applications/NXtofraw.nxdl.xml:      <group type="NXdata" name="data">
applications/NXtofsingle.nxdl.xml:      <group type="NXdata" name="data">
applications/NXtomo.nxdl.xml:      <group type="NXdata" name="data">
applications/NXtomophase.nxdl.xml:      <group type="NXdata" name="data">
applications/NXtomoproc.nxdl.xml:      <group type="NXdata" name="data">
contributed_definitions/NXxpcs.nxdl.xml:    <group type="NXdata" name="data">

And were these specifications victims of copy-paste errors?

(base) prjemian@zap:~/.../NeXus/definitions$ git grep 'type="NXdata"' | grep 'name="name"'
applications/NXlauetof.nxdl.xml:    <group type="NXdata" name="name">
applications/NXxeuler.nxdl.xml:      <group type="NXdata" name="name">
applications/NXxkappa.nxdl.xml:    <group type="NXdata" name="name">
applications/NXxnb.nxdl.xml:    <group type="NXdata" name="name">
applications/NXxrot.nxdl.xml:    <group type="NXdata" name="name">
prjemian commented 1 year ago

There are 367 instances to examine (in the repository default branch now):

(base) prjemian@zap:~/.../NeXus/definitions$ git grep "<group " | grep name= | wc -l
367
prjemian commented 1 year ago

That is the definition of an issue good for a code camp.