Open aidanheerdegen opened 1 month ago
Would the commit hash for the exptrunID
be the runlog commit right before the model is run? E.g. https://github.com/payu-org/payu/blob/68d8482e5307af62603431fe95f1426a28056948/payu/experiment.py#L653-L654
If so, the model method add_output_metadata
adding the ID to configuration files might need to be run then rather than at setup
? And then, would this feature only be enabled only if runlog
is enabled? As it might not make as much sense to use just the experimentId
unless it was experimentId.runNumber
- but then there could be clash between run numbers.. A small initial payu PR could be a metadata method that generates an exptRunId after the runlog commit? This can then be passed to model drivers methods.
Would the commit hash for the
exptrunID
be the runlog commit right before the model is run
Yes.
If so, the model method
add_output_metadata
adding the ID to configuration files might need to be run then rather than atsetup
?
Good point. And yes. I kinda thought I'd get it wrong and need you to say where we should put it.
And then, would this feature only be enabled only if
runlog
is enabled?
Yes. It doesn't really make sense otherwise.
A small initial payu PR could be a metadata method that generates an exptRunId after the runlog commit? This can then be passed to model drivers methods.
I like it.
I think I might have changed my mind about concatenating the IDs together. The motivation was to make it simpler, just embed a single metadata item. But it makes everything else more complicated. Also the experiment ID will be used widely, in intake catalogues etc, so I think it makes sense to have that as a separate, unambiguous, easy to access metadata attribute.
Ok, so are you saying there should be two fields added to outputs? An experiment_uuid
and a experiment_run_id
which is just the runlog commit hash?
Ok, so are you saying there should be two fields added to outputs? An
experiment_uuid
and aexperiment_run_id
which is just the runlog commit hash?
Yep.
Embedding experiment ID and run commit hashes into model output diagnostics is essential for experiment provenance: it establishes a link between the outputs of an experiment and all the provenance data of the experiment. It means consumers of the data, regardless of where they find it, have the possibility of finding this essential information.
These identifying hashes then have the ability to become persistent identifiers (PIDs) once there is a service to resolve them and expose the related metadata to users. Such a service doesn't exist ... yet. But embedding this information is a necessary precursor.
Proposal
git
commit hash as a unique identifier (exptrunID
?) for each run of a model, where an experiment constitutes a number of such consecutive runs.exptrunID
as a metadata field in all model output diagnostics, e.g. global netCDF attribute.exptrunID
as an configuration input to the model so the metadata is added when the diagnostic is written. If this isn't possible add metadata after the run has completed.Implementation
Where possible the
exptrunID
should be added as a model configuration input option and written directly into the model outputs. This has two benefits:Each model should take care of adding this metadata to the model diagnostic outputs. This means the
model
class should have a stub methodadd_output_metadata
that is either not implemented, or has some useful default like adding global attribute to netCDF files.add_output_metadata
should be called atsetup
andarchive
stages so thatexptrunID
can be added either before a run, or after it has completed. The method needs to have logic to decide if it runs atsetup
orarchive
. If there isn't a better way, like some call-graph inspection, then the stage should be passed to the method.Notes
mpp_write_meta
routine for MOM5For MOM6 Global attributes can be written by calling register_global_attribute. Scalar and 1d real and integers (32 and 64 bit) and scalar string values are supported
This interface can be used with any FMS2_io fileobj, but the open_file needs to be called before using it.
netCDF: https://github.com/COSIMA/cice5/blob/edcfa6f9c76ed05b63196ce4b5355fa5a8f4fe3a/io_netcdf/ice_history_write.F90#L922-L978
pio: https://github.com/COSIMA/cice5/blob/edcfa6f9c76ed05b63196ce4b5355fa5a8f4fe3a/io_pio/ice_history_write.F90#L877-L934