Open aclum opened 1 month ago
ref: https://microbiomedata.github.io/nmdc-schema/identifiers/#ids-minted-for-use-within-nmdc
@aclum could you clarify when a WorkflowExecutionActivity
should be attached to a new version of an existing id, vs when it should have a newly minted id? Given two WorkflowExecutionActivity
workflows of the same type (eg MagAnalysisActivity
) and the same was_informed_by
, will it:
Depending on the desired behavior, we can either:
Update the json validator in nmdc_runtime/util.py
to check that anytime a post request is made with one or more WorkflowExecutionActivity
docs, the validator looks for existing workflows with the same type and was_informed_by
values, and if prior versions exist, we can enforce that the post request is made with the correct version.
NOTE: This would require versions to increment in a consistent way (eg v1, v2..
, not v1, v1a, v1b, v2..
etc)
Determine the logic for when a workflow should get a newly minted id vs a new version number.
Given two WorkflowExecutionActivity workflows of the same type (eg MagAnalysisActivity) and the same was_informed_by, it should always increment. Ideally this would be written generically enough so it could handle the migration to berkeley so would also have to look at WorkflowChain.
We should consider race conditions here...what happens if a .2 is minted but not used and another request is made, etc.
@aclum How frequently would you expect this kind of race condition would occur? Would it be acceptable for the minter to give out strictly increasing sequential version ids (1, 2, 3, etc), and for the versions in mongo to be strictly increasing, but non-sequential (1, 3, 7)?
Right now I figure this out in the scheduler. It is possible for a gap to occur but that would typically be due to some error. Does this answer the question?
It is possible for a gap to occur but that would typically be due to some error. Does this answer the question?
@scanon if the last-submitted version of an activity is N
, should the runtime accept submission of a version N+2
and subsequently reject submission of a version N+1
(prompting the submitter to mint a new ID)? That is, should the runtime enforce increment-by-1 order, or just total order?
I think we'll get into trouble if we don't acceptN+2
, there will be a small percentage of errors so if the runtime increments every time there will be some version numbers skipped. This would be confusing someone looking at the identifier b/c there would be missing version numbers but I'm not sure how else to do this. The identifiers are going to be embedded in the data headers for assembly and annotation so you wouldn't want to change these after the fact if runtime rejects the submission.
Endpoints that accept post (
/workflows_activities
,json:submit
) forWorkflowExecutionActivity
subclasses should make sure versioning rules are being followed. The schema is validating the syntax but not that the incrementation is correct.Expected behavior: The first time a workflow runs (unique value for
was_informed_by
)a workflow example identifier would be nmdc:wfrqc-11-abc1d.1, the second time a workflow is run for the same value ofwas_informed_by
the workflow should keep the identifier through the blade and increment the ID version (ie nmdc:wfrqc-11-abc1d.2)There is currently no validation on this and we have instances where a second run of a workflow for a value of
was_informed_by
mints a new ID instead of incrementing the version id.cc @shreddd to identify someone to work on this.