Open paulmillar opened 4 months ago
If a NeXus file contains multiple NXentry groups, each with a single NXsample group, are these NXsample group the same sample or potentially different samples?
Generally, they are different samples, defined only within their parent NXentry group. If you want to make them the same, write the same information or use HDF5 links.
... if a NXentry contains multiple NXsample groups ...
Can you describe a specific use case for such?
Without a maxOccurs="1"
attribute in the declaration of NXsample
within NXentry
: https://github.com/nexusformat/definitions/blob/4c09c7718c41dc90eb996475efdf1c0d30fb1d5d/base_classes/NXentry.nxdl.xml#L223
then the default is "unbounded"
: https://github.com/nexusformat/definitions/blob/4c09c7718c41dc90eb996475efdf1c0d30fb1d5d/nxdl.xsd#L412
This means that multiple NXsample groups are allowed.
When you provide multiple NXsample groups, it's up to you to define if they are the same or different and to describe how the multiple groups should be handled by users of such data.
The specific use case that has come up is describing the sample environment. This comes from the work on embedding SECoP information in NeXus.
The current proposed strategy involves creating two NXSample
groups: one with type
field of sample
and the other with the type
field of sample environment
. Both NXSample
groups are placed under the NXentry
.
(I believe there are some issues with this type
field, but I think that is largely independent from this issue.)
You can see example files that demonstrate this in this repo, and can (interactively) view those file's contents using myHDF5: sample_env_00591.nxs and sample_env_00592.nxs.
This means that multiple NXsample groups are allowed.
Indeed. That was my conclusion, too.
When you provide multiple NXsample groups, it's up to you to define if they are the same or different and to describe how the multiple groups should be handled by users of such data.
Well, OK.
But then, my follow-on question would be ... how?
If NeXus has this flexibility (i.e., there are no restrictions placed on the relationship between multiple NXsample
groups) then how should the person/agent writing the NeXus file indicate the intended relationship between multiple NXsample
groups?
This could be described in the description
field. However, I believe that field is intended for human consumption.
The current proposed strategy involves creating two
NXSample
groups: one withtype
field ofsample
and the other with thetype
field ofsample environment
. BothNXSample
groups are placed under theNXentry
.
Thanks for that description. As shown above, multiple NXsample
groups are allowed by the NXDL language.
As you noted, a custom attribute of type
is problematic due to conflict with other use in the NXDL language. (In fields, type
refers to the storage type, such as NX_CHAR
. In groups, type
refers to the name of the NeXus class definition, such as NXuser
.) Pick a different
But then, my follow-on question would be ... how?
AFAIK, NeXus has no formal mechanism. I recall discussions (about maxOccurs
) but the assumption to date has not involved multiple NXsample groups. Make a proposition showing how to distinguish the relationship.
Could the sample environment be an environment
(type="NXenvironment"
) group within NXsample?
I agree that type
(as an attribute within the NXsample
class) is problematic. Please see #1366 for more comments on this theme. (I've referenced your comment there, so it isn't lost.)
However, please note that this type
attribute (within NXsample
) in not custom, but an already established part of NeXus. In fact, the attribute was part of the initial definition of NXsample
and (seemingly) has not modified since.
The use of type
in NXsample
is a field
, not an attribute
: https://github.com/nexusformat/definitions/blob/4c09c7718c41dc90eb996475efdf1c0d30fb1d5d/base_classes/NXsample.nxdl.xml#L204-L217
The name is confusing here, yet historical (and tough to be changed). A lesson learned. Perhaps.
Could the sample environment be an environment (type="NXenvironment") group within NXsample?
I was thinking along similar lines, too.
However, although not explicitly documented (see #1362), it seems that NXenvironment
groups are intended to describe individual specific apparatuses/devices.
It seems the current proposed way of including SECoP information in NeXus takes the same interpretation.
It has multiple NXenvironment
groups, each located within the same NXsample
/ type=sample environment
group. These NXenvironment
groups each describe a different physical device (cryo, magnets, 3He compressor/value, etc) that have been identified and monitored through SECoP.
The use of type in NXsample is a field, not an attribute
Sorry. You're right: type
is a field and not an attribute.
I apologize in advance for saying this, but is has to be said sometime soon, especially for the US labs that are going to have do deal with the implementing the White House data sharing mandate over the next two years:
The real problem here is much deeper and has to do with fundamental database semantics. Our ultimate objective is to have an unambiguous searchable presentation of the information describing the various components and data values and relations among them in one of more experiments. Until 1970 there were multiple contenders as to the best way to put such information into databases.. There were hierarchical databases, network databases, object--oriented databases and more. In 1970, E. F. Codd gave us relational databases. By the mid 1980's it was clear that Codd's relational databases were the only reliable way to formally handle multi-reader/multi-writer databases. Everything else had/has problems with referential integrity during updates. NeXus is closest to the hierarchical model. If we want to clean up all the ambiguities, we have no choice. We have to establish and maintain a full bidirectlonal mapping to NeXus to and from a presentation as a relational database, e.g. as CIF. We do this already for NXmx. It would take us a few years to do this for all of NeXus, but you may wish to consider it as a possible project. It won't break NeXus. It won't break HDF5. It just gives you the structure you need to know what is related to what is a reliable, disciplined way that simple trees don't provide. Indeed, it makes the trees irrelevant decoration that is helpful in writing some applications, but which can bury you in a hard to follow nest of pointers. It does bend your head a bit when you get to doing normalization, but with the data volumes we are going to have to deal with in complying with the various data sharing mandates, it is time to get started. All it takes to get started is to define a relational table name (a category) and a column name for each tag, and -- the hard part -- to decide on the structure of keys (one or more columns) that uniquely identify each row of each relational table, (This is where you come to grips with the multiple sample question, for example).
Again, my apologies. We can do this now. We can wait and do it later, but eventually somebody will have to do it.
Regards, Herbert
On Fri, Feb 23, 2024 at 11:03 AM Paul Millar @.***> wrote:
The use of type in NXsample is a field, not an attribute
Sorry, you're right: type is a field and not an attribute.
— Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/1361#issuecomment-1961592536, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABB6EAJHGGMEWK5FVBGH763YVC4WFAVCNFSM6AAAAABDWKAMUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRGU4TENJTGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
The current definition of
NXsample
is:This definition doesn't make it clear what is the semantic relationship between multiple
NXsample
groups.In more concrete terms...
It's fairly clear that a single
NXsample
group describes a single sample (the above description says "the sample").However, if a
NXentry
contains multipleNXsample
groups, does eachNXsample
group describe the same sample or potentially different samples?If a NeXus file contains multiple
NXentry
groups, each with a singleNXsample
group, are theseNXsample
group the same sample or potentially different samples?