nexusformat / definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
26 stars 56 forks source link

Are the new doc strings extending (specialising) or overriding the inherited doc? #1059

Open prjemian opened 2 years ago

prjemian commented 2 years ago

E.g. if NXdetector_module/data_origin is overwritten by 
NXdetector/detector_module/data_origin, and then
 NXmyappdef:/detector (NXdetector)/detector_module(NXdetector_module) is used will it inherit from NXdetector_module or from NXdetector/NXdetector_module?Proposal:

prjemian commented 2 years ago

This could (and probably should) be solved in the specific documentation that cites the replacement. The documentation should reference the replacement, rather than the base class target.

prjemian commented 2 years ago

The documentation might make such links automatically by a two-step process:

  1. collect all possible links
  2. resolve which anchor to use, searching within the NXDL definition first
sanbrock commented 2 years ago

NeXus support two type of main definition categories: base, and application. The basic difference is that the default optionality of the defined elements are “optional” for base_class definitions, but “required” inside application definitions. We can also note that while most of the definitions extend NXobject, a few applications extend another application definition. As the documentation says: “In contrast to NeXus base classes, NeXus supports inheritance in application definitions.”

On the other hand, since the keyword ‘extends’ is so rarely used (not pointing to NXobject) in current applications, it is a question if and how such inheritance is implemented by different tools and how inherited data items are handled inside the new definition.

Only extension (addition of new data items, like groups/fields/attributes) is supported or also override where already introduced elements could even be redefined?

Another question how the data item definitions of base_classes are (re)used in another base class or in an application definition if no inheritance is supported?

Actually, the reuse/reusability is triggered by referencing a base class as a ‘type=’. Here, the assumption is that all data items defined under the tree of the referenced base class and in the trees of the base classes referenced therein will automatically be available for reuse under a (not always) specified ‘name’ in the new definition. Hence, definitions are inherited inside base_classes as well.

When such reference also provides extra definition elements (e.g. doc in case of NXbeam/DATA), this is handled as a specific definition only valid for this item which further specifies the original base definition (in case of NXbeam/DATA(doc), the original NXdata(doc) is actually extended). Reuse in an application definition is the same with the difference that optionality is by default switched to being “required” (E.g. NXareps/ENTRY/title(optional)=False as opposed to NXentry/title(optional)=True). Note that although definitions are inherited, if modifications happen at a specific data item they result in a new item definition (e.g. extending/specialising the documentation, changing optionality, adding new data items, etc.). Such an extended/modified item definition will be then inherited when this item is referenced inside another definition. E.g. NXmy_arpes/ENTRY/arpes_base[NXarpes] /just like NXmy_arepes(NXarpes)/ would inherit NXarpes/ENTRY/INSTRUMENT/analyser[NXdetector]/acquisition_mode(enum:[swept, fixed]) rather then NXdetector(enum=[gated, triggered, summed, event, histogrammed, decimated])

With its ‘type’-referencing definition-reuse functionality, NeXus implements Single, Multilevel and Hierarchical Inheritance (see https://beginnersbook.com/2013/05/java-inheritance-types/): NXarpes/ENTRY/INSTRUMENT/analyser[NXdetector] extends the referenced NXentry/INSTRUMENT/DETECTOR which is referencing NXinstrument/DETECTOR which is referencing NXdetector by new data items including the field ‘energies’.

The inheritance in NeXus allows the reuse of a complete definition tree with all its inherited sub definitions. Overriding a data item can be achieved after referencing it with the corresponding name/type combination. Note that for convenience, doc strings are not overridden, but extended/specialised by default, and any overriding doc string shall explicitly state if inherited doc strings shall not be considered.

Inheritance relationships: IS A - implemented in NeXus by ‘extends=‘ or ‘type=‘ HAS or MAY CONTAIN (depending on optionality) - implemented in NeXus by explicit or inherited sub definitions

sanbrock commented 2 years ago

Example for retrieving inherited doc strings for NXarpes/ENTRY/DATA while processing a data file:

INFO: ===== GROUP (//entry/data [NXarpes::/NXentry/NXdata]): <HDF5 group "/entry/data" (4 members)> INFO: classpath: ['NXentry', 'NXdata'] INFO: classes: NXarpes.nxdl.xml:/ENTRY/DATA NXentry.nxdl.xml:/DATA NXdata.nxdl.xml: INFO: <> INFO: documentation (NXarpes.nxdl.xml:/ENTRY/DATA): INFO: INFO: documentation (NXentry.nxdl.xml:/DATA): INFO: The data group

                    .. note:: Before the NIAC2016 meeting [#]_, at least one
                            ...

INFO: documentation (NXdata.nxdl.xml:): INFO: :ref:NXdata describes the plottable data and related dimension scales.

            .. index:: plotting

            It is mandatory  that there is at least one :ref:`NXdata` group 
                        ...
sanbrock commented 2 years ago

Similar example with also retrieving enumeration lists for NXarpes/ENTRY/INSTRUMENT/analyser[NXdetector]/acquisition_mode:

INFO: ===== FIELD (//entry/instrument/analyser/acquisition_mode): <HDF5 dataset "acquisition_mode": shape (), type "|O"> INFO: value: b'fixed' INFO: classpath: ['NXentry', 'NXinstrument', 'NXdetector', 'NX_CHAR'] INFO: classes: NXarpes.nxdl.xml:/ENTRY/INSTRUMENT/analyser/acquisition_mode NXdetector.nxdl.xml:/acquisition_mode INFO: <> INFO: enumeration (NXarpes.nxdl.xml:/ENTRY/INSTRUMENT/analyser/acquisition_mode): INFO: -> swept INFO: -> fixed INFO: enumeration (NXdetector.nxdl.xml:/acquisition_mode): INFO: -> gated INFO: -> triggered INFO: -> summed INFO: -> event INFO: -> histogrammed INFO: -> decimated INFO: documentation (NXarpes.nxdl.xml:/ENTRY/INSTRUMENT/analyser/acquisition_mode): INFO: INFO: documentation (NXdetector.nxdl.xml:/acquisition_mode): INFO: The acquisition mode of the detector.

sanbrock commented 2 years ago

an implementation is available under: https://github.com/nomad-coe/nomad-parser-nexus/blob/bb3ef7693643a7b745ee8c9786dd68d83e361663/nexusparser/tools/nexus.py#L639-L700

short draft:

def get_inherited_nodes(nxdl_path: str = None):
    """Returns a list of ET.Element for the given path."""
    # let us start with the given definition file
    elist = []
    add_base_classes(elist, nxdl_path.split('/')[0])
    # walk along the path
    for html_name in nxdl_path.split('/')[1:]:
        # from low priority inheritance classes to higher
        for ind in range(len(elist) - 1, -1, -1):
            elist[ind] = get_direct_child(elist[ind], html_name)
            if elist[ind] is None:
                del elist[ind]
                continue
            # override: remove low priority inheritance classes if class_type is overriden
            if len(elist) > ind + 1 and get_nx_class(elist[ind]) != get_nx_class(elist[ind + 1]):
                del elist[ind + 1:]
            # add new base class(es) if new element brings such (and not a primitive type)
            if len(elist) == ind + 1 and get_nx_class(elist[ind])[0:3] != 'NX_':
                add_base_classes(elist)
    return elist

def add_base_classes(elist, nx_name=None):
    """ add the base classes corresponding to the last element in elist to the list
        Note that if elist is empty, a nxdl file with the name of nx_name is used"""
    if elist and nx_name is None:
        nx_name = get_nx_class(elist[-1])
    if elist and nx_name and f"{nx_name}.nxdl.xml" in (e.get('nxdlbase') for e in elist):
        return
    elem = ET.parse(f"{nx_name}.nxdl.xml").getroot()
    elist.append(elem)
    # add inherited base classes
    if 'extends' in elem.attrib and elem.attrib['extends'] != 'NXobject':
        add_base_classes(elist, elem.attrib['extends'])
    else:
        add_base_classes(elist)

def get_direct_child(nxdl_elem, html_name):
    """ returns the child of nxdl_elem which has a name
        corresponding to the the html documentation name html_name"""
    for child in nxdl_elem:
        if get_local_name_from_xml(child) in ('group', 'field', 'attribute') and html_name == get_node_name(child):
            return child
prjemian commented 2 years ago

@sanbrock Will this issue be resolved in the next days? Is it necessary to resolve this for release of NXDL now?

sanbrock commented 2 years ago

@prjemian Creating the Vocabulary Table is a good first step. I think, that is enough for the next release. We can then review what shall be the next steps.

prjemian commented 2 years ago

Thanks!