nexusformat / NIAC

Issue for the NIAC to discuss (no code)
2 stars 0 forks source link

Addition of globally unique identifier to NXentry (and potentially others) #80

Closed stuartcampbell closed 3 years ago

stuartcampbell commented 3 years ago

The current NXentry has the field entry_identifier which is defined as a "unique identifier for the measurement, defined by the facility."

Facilities that are using the Bluesky data acquisition have both a unique identifier (which is a uuid) and a 'transient identifier' which is along the lines of a traditional scan/run identifier.

As more facilities are adopting Bluesky, it would be nice to come to a recommendation on how to store both the uuid and scan id before we all develop our own standards. At the moment, I can see an argument for storing either the 'scan id' or the 'uuid' in the entry_identifier - but then where to store the other one.

Either adding an optional field like entry_uuid or unique_entry_identifier or allowing the adding a uuid attribute ?

stuartcampbell commented 3 years ago

Actually, based on the previous discussion on the beam_size_x type issue, I would propose either entry_uuid or entry_identifier_uuid seem to fit with the recommendations.

prjemian commented 3 years ago

NeXus allows the addition of field not defined in the NXDL base class. This proposal is to identify and reserve the names of specific fields for two terms that are part of Bluesky. Such inclusion in the NeXus base class will standardize how NeXus files are generated from Bluesky data (and the reverse as well).

Citing from the manual about the NeXus Community:

NeXus began as a group of scientists with the goal of defining a common data storage format to exchange experimental results and to exchange ideas about how to analyze them.

Already, NeXus is proposed as an interchange format to exchange Bluesky data with high-performance computing and machine learning.

prjemian commented 3 years ago

What is the bluesky entity identified by entry_identifier_uuid?

prjemian commented 3 years ago

Examples of current use (not necessarily consistent with this proposal) are here: https://apstools.readthedocs.io/en/latest/source/_filewriters.html?highlight=NeXus#hdf5-nexus-file-structures

prjemian commented 3 years ago

uuid4: https://docs.python.org/3/library/uuid.html#uuid.uuid4

stuartcampbell commented 3 years ago

What is the bluesky entity identified by entry_identifier_uuid?

The start document uid

stuartcampbell commented 3 years ago

Proposal:

prjemian commented 3 years ago

DOI for UUIDs: https://doi.org/10.17487%2FRFC4122 (or https://www.rfc-editor.org/info/rfc4122)

benajamin commented 3 years ago

The proposal was accepted with 13 votes for (0 against) @stuartcampbell has said that the Bluesky project prefix should be BLUESKY_

prjemian commented 2 years ago

UUID specification: https://www.rfc-editor.org/rfc/rfc4122.html