nexusformat / definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
26 stars 56 forks source link

Clarify experiment_identifier/collection_identifier/entry_identifier #1043

Open woutdenolf opened 2 years ago

woutdenolf commented 2 years ago

I'm looking for a destination for the following metadata associated to a data collection:

I'm currently thinking about this but I'm not sure I'm using the fields correctly:

# ./S_220323_00006_0001.h5

10.1: NXentry  # this is 1 scan in the file S_220323_00006_0001.h5

    experiment_identifier = "hg-124"
    entry_identifier =  ["x-ray powder diffraction"]
    collection_identifier = "S_220323_00006_0001"

    title = "fscan 0 10 100 0.1"

    instrument: NXinstrument
        name = "ESRF-ID31"
            @short_name = "ID31"

    sample: NXsample
        name = "S-220323-00006"
        uuid = "4590be84-3493-4bd2-91fe-4cf39cfcf71f"

I'm especially confused about experiment_identifier, entry_identifier and collection_identifier. Could someone clarify those? What I have is a proposal name "hg-124", a data collection name "S_220323_00006_0001" and and the techniques used ["x-ray powder diffraction"].

woutdenolf commented 2 years ago

So this basically this requires a better description on the experiment_identifier, entry_identifier and collection_identifier fields in NXentry. @prjemian @FreddieAkeroyd From the history it seems both of you have worked on this.

prjemian commented 2 years ago

collection_identifier is a group of files (or group of database records, for the Bluesky framework) of which this data is a part.

Sometimes that's a parent folder, could also be a single SPEC data file or set of SPEC files, or a collection in mongodb (as in bluesky).

prjemian commented 2 years ago

entry_identifier is a placholder for the identification of this data provided by the facility.

prjemian commented 2 years ago

experiment_identifier and entry_identifier seem identical but (probably) provide a distinction to some facilities.

prjemian commented 2 years ago

Here's my suggestion

> I'm looking for a destination for the following metadata associated to a data collection:
> 
>     synchrotron: "ESRF"
>     beamline: "ID31"
>     proposal id, defined in the scope of the ESRF: "hg-124"
>     data collection id, defined in the scope of the proposal: "S_220323_00006_0001"
>     sample ID, defined in the user scope: "S-220323-00006"
>     sample UUID, defined in the user scope: "4590be84-3493-4bd2-91fe-4cf39cfcf71f"
>     name of the technique(s): ["x-ray powder diffraction"]
>     command: "fscan 0 10 100 0.1"

# describe the information as provided (NXentry has "experiment_documentation" for this)
/entry/experiment_documentation:NXnote/beamline = "ID31"
/entry/experiment_documentation:NXnote/command = "fscan 0 10 100 0.1"
/entry/experiment_documentation:NXnote/data_collection_id = "S_220323_00006_0001"
/entry/experiment_documentation:NXnote/proposal_id = "hg-124"
/entry/experiment_documentation:NXnote/sample_id = "S-220323-00006"
/entry/experiment_documentation:NXnote/sample_uuid = "4590be84-3493-4bd2-91fe-4cf39cfcf71f"
/entry/experiment_documentation:NXnote/synchrotron = "ESRF"
/entry/experiment_documentation:NXnote/techniques = ["x-ray powder diffraction"]

# fill out the standard base classes
/entry/command -- link to /entry/experiment_documentation/command
/entry/entry_identifier -- link to /entry/experiment_documentation/data_collection_id
/entry/experiment_identifier -- link to /entry/experiment_documentation/proposal_id
/entry/instrument/name -- link to /entry/experiment_documentation/beamline
/entry/instrument/source/name -- link to /entry/experiment_documentation/synchrotron
/entry/instrument/source/probe = "x-ray"
/entry/instrument/source/type = "Synchrotron X-ray Source"
/entry/sample/name -- link to /entry/experiment_documentation/sample_id

# for convenience, but not described in NeXus (so not illegal, either), provide at root level
/@beamline = "ID31"
/@facility = "ESRF"
benajamin commented 2 years ago

I went digging though old NIAC minutes for context and found:

Git Blame says that the following are 13 years old (i.e. coming the old SVN repository):

This suggests that we need someone who has been in NeXus from the beginning (e.g. @mkoennecke @rayosborn @FreddieAkeroyd ) to get further context on the intention of these fields.

benajamin commented 2 years ago

@rayosborn says that the facilities were each doing their own thing with various identifiers and this set was able to satisfy everyone. Actual usage probably varied a lot and nobody was very interested in forcing everyone to adopt the same usage.

We might also find some example usage in files from the example data repository

prjemian commented 2 years ago

Comments were made on this issue at 2022-06 Code Camp. @woutdenolf : Is it necessary to resolve this for release of NXDL now?

woutdenolf commented 2 years ago

We can keep it for the next release

prjemian commented 2 years ago

Use of each these terms seems to be particular to a subset of facilities. We could benefit from facility examples, how they use (or not) each of these fields