nexusformat / definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
26 stars 55 forks source link

Unclear semantics if there are multiple `NXsample` groups #1361

Open paulmillar opened 4 months ago

paulmillar commented 4 months ago

The current definition of NXsample is:

<doc>
    Any information on the sample. 

    This could include scanned variables that
    are associated with one of the data dimensions, e.g. the magnetic field, or
    logged data, e.g. monitored temperature vs elapsed time.
</doc>

This definition doesn't make it clear what is the semantic relationship between multiple NXsample groups.

In more concrete terms...

It's fairly clear that a single NXsample group describes a single sample (the above description says "the sample").

However, if a NXentry contains multiple NXsample groups, does each NXsample group describe the same sample or potentially different samples?

If a NeXus file contains multiple NXentry groups, each with a single NXsample group, are these NXsample group the same sample or potentially different samples?

prjemian commented 4 months ago

If a NeXus file contains multiple NXentry groups, each with a single NXsample group, are these NXsample group the same sample or potentially different samples?

Generally, they are different samples, defined only within their parent NXentry group. If you want to make them the same, write the same information or use HDF5 links.

prjemian commented 4 months ago

... if a NXentry contains multiple NXsample groups ...

Can you describe a specific use case for such?

prjemian commented 4 months ago

Without a maxOccurs="1" attribute in the declaration of NXsample within NXentry: https://github.com/nexusformat/definitions/blob/4c09c7718c41dc90eb996475efdf1c0d30fb1d5d/base_classes/NXentry.nxdl.xml#L223

then the default is "unbounded": https://github.com/nexusformat/definitions/blob/4c09c7718c41dc90eb996475efdf1c0d30fb1d5d/nxdl.xsd#L412

This means that multiple NXsample groups are allowed.

When you provide multiple NXsample groups, it's up to you to define if they are the same or different and to describe how the multiple groups should be handled by users of such data.

paulmillar commented 4 months ago

The specific use case that has come up is describing the sample environment. This comes from the work on embedding SECoP information in NeXus.

The current proposed strategy involves creating two NXSample groups: one with type field of sample and the other with the type field of sample environment. Both NXSample groups are placed under the NXentry.

(I believe there are some issues with this type field, but I think that is largely independent from this issue.)

You can see example files that demonstrate this in this repo, and can (interactively) view those file's contents using myHDF5: sample_env_00591.nxs and sample_env_00592.nxs.

paulmillar commented 4 months ago

This means that multiple NXsample groups are allowed.

Indeed. That was my conclusion, too.

When you provide multiple NXsample groups, it's up to you to define if they are the same or different and to describe how the multiple groups should be handled by users of such data.

Well, OK.

But then, my follow-on question would be ... how?

If NeXus has this flexibility (i.e., there are no restrictions placed on the relationship between multiple NXsample groups) then how should the person/agent writing the NeXus file indicate the intended relationship between multiple NXsample groups?

This could be described in the description field. However, I believe that field is intended for human consumption.

prjemian commented 4 months ago

The current proposed strategy involves creating two NXSample groups: one with type field of sample and the other with the type field of sample environment. Both NXSample groups are placed under the NXentry.

Thanks for that description. As shown above, multiple NXsample groups are allowed by the NXDL language.

As you noted, a custom attribute of type is problematic due to conflict with other use in the NXDL language. (In fields, type refers to the storage type, such as NX_CHAR. In groups, type refers to the name of the NeXus class definition, such as NXuser.) Pick a different

prjemian commented 4 months ago

But then, my follow-on question would be ... how?

AFAIK, NeXus has no formal mechanism. I recall discussions (about maxOccurs) but the assumption to date has not involved multiple NXsample groups. Make a proposition showing how to distinguish the relationship.

Could the sample environment be an environment (type="NXenvironment") group within NXsample?

paulmillar commented 4 months ago

I agree that type (as an attribute within the NXsample class) is problematic. Please see #1366 for more comments on this theme. (I've referenced your comment there, so it isn't lost.)

However, please note that this type attribute (within NXsample) in not custom, but an already established part of NeXus. In fact, the attribute was part of the initial definition of NXsample and (seemingly) has not modified since.

prjemian commented 4 months ago

The use of type in NXsample is a field, not an attribute: https://github.com/nexusformat/definitions/blob/4c09c7718c41dc90eb996475efdf1c0d30fb1d5d/base_classes/NXsample.nxdl.xml#L204-L217

The name is confusing here, yet historical (and tough to be changed). A lesson learned. Perhaps.

paulmillar commented 4 months ago

Could the sample environment be an environment (type="NXenvironment") group within NXsample?

I was thinking along similar lines, too.

However, although not explicitly documented (see #1362), it seems that NXenvironment groups are intended to describe individual specific apparatuses/devices.

It seems the current proposed way of including SECoP information in NeXus takes the same interpretation.

It has multiple NXenvironment groups, each located within the same NXsample / type=sample environment group. These NXenvironment groups each describe a different physical device (cryo, magnets, 3He compressor/value, etc) that have been identified and monitored through SECoP.

paulmillar commented 4 months ago

The use of type in NXsample is a field, not an attribute

Sorry. You're right: type is a field and not an attribute.

yayahjb commented 4 months ago

I apologize in advance for saying this, but is has to be said sometime soon, especially for the US labs that are going to have do deal with the implementing the White House data sharing mandate over the next two years:

The real problem here is much deeper and has to do with fundamental database semantics. Our ultimate objective is to have an unambiguous searchable presentation of the information describing the various components and data values and relations among them in one of more experiments. Until 1970 there were multiple contenders as to the best way to put such information into databases.. There were hierarchical databases, network databases, object--oriented databases and more. In 1970, E. F. Codd gave us relational databases. By the mid 1980's it was clear that Codd's relational databases were the only reliable way to formally handle multi-reader/multi-writer databases. Everything else had/has problems with referential integrity during updates. NeXus is closest to the hierarchical model. If we want to clean up all the ambiguities, we have no choice. We have to establish and maintain a full bidirectlonal mapping to NeXus to and from a presentation as a relational database, e.g. as CIF. We do this already for NXmx. It would take us a few years to do this for all of NeXus, but you may wish to consider it as a possible project. It won't break NeXus. It won't break HDF5. It just gives you the structure you need to know what is related to what is a reliable, disciplined way that simple trees don't provide. Indeed, it makes the trees irrelevant decoration that is helpful in writing some applications, but which can bury you in a hard to follow nest of pointers. It does bend your head a bit when you get to doing normalization, but with the data volumes we are going to have to deal with in complying with the various data sharing mandates, it is time to get started. All it takes to get started is to define a relational table name (a category) and a column name for each tag, and -- the hard part -- to decide on the structure of keys (one or more columns) that uniquely identify each row of each relational table, (This is where you come to grips with the multiple sample question, for example).

Again, my apologies. We can do this now. We can wait and do it later, but eventually somebody will have to do it.

Regards, Herbert

On Fri, Feb 23, 2024 at 11:03 AM Paul Millar @.***> wrote:

The use of type in NXsample is a field, not an attribute

Sorry, you're right: type is a field and not an attribute.

— Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/1361#issuecomment-1961592536, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABB6EAJHGGMEWK5FVBGH763YVC4WFAVCNFSM6AAAAABDWKAMUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRGU4TENJTGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>