Open mih opened 11 months ago
With the provenance ontology component the groundwork is laid. The main connection to standard PROV will be the notion of a SoftwareAgent
(the software that ran to produce a dataset revision). This agent description should somehow capture datalad-run
as well as the "payload" software inside.
Candidate properties for linking the immediate software Agent
with the payload Agent
are:
Additional, thoughts on how to map a run record onto PROV.
We need to distinguish the planning aspects from the actual activity. The cmd
property of a run record can be considered the prov:Plan
. This plan ideally makes parameters it may have explicit.
This plan is associated with an activity. A prov:SoftwareAgent
(datalad-run), is also associated with that activity.
When the subject of the provenance is a dataset worktree, we can always consider the activity to be a prov:Derivation
when there was a parent commit.
When the subject of provenance is a file, we can consider the activity a prov:Generation
whenever there were no declared inputs, and prov:Derivation
otherwise.
or prov:Generation
).
For one commit with a run record, we can create a prov report for the commit, and also for individual outputs. They would share the same prov:Plan
. However, in practice it may be too much to distinguish individual activities for generating a full tree vs individual file. Likely we link to the same activity. Maybe that also implies that we should not distinguish between Generation
and Derivation
, but stick to Activity
.
This is information on a process responsible for a particular commit.