This is an operational metadata store, part of the overall Preservation Core Catalog. It contains a datetime-stamped record of events/actions relating to the management of an object in preservation code -- from deposit/ingest of each version, through each replication, every audit check and finding or recovery action taken. As the name implies, this constitutes the object's provenance.
This is described as a component of the Preservation Core Catalog, or PCC. It may make sense as a third database table, or it may exist in some other form, e.g., a structured text file. It is required that this information be maintained per object as a readily and efficiently accessible set of information so that it can be displayed only via Argo or easily attached to objects for preservation or dissemination.
Desired data is a basic set of when, what, who and description, e.g., "Fixit check passed" or "Object X replaced in Local Archive".
Operationally it constitutes an object's history and should be accessible via a Web Service call to authorized users, including Argo for incorporation into its object detail display.
This story does NOT cover related forms of provenance integration or disposition. For instance: there is a provenanceMetadata datastream in objects coming from DOR which contain a small set of event information relating to DOR's management of the objects. In principle it is part of a larger event sequence that covers both DOR and PC custodianship of the object over time. How these are combined for any future export of the object and its complete provenance is not in scope here.
If we look just at the event-level information there, we see:
datetime stamp
the what - which can be the object as a whole or a specific file within the object
who - identifying the process that's writing the entry
what - a free form but brief description
This may suffice for us here, without trying to categorize of codify entries.
Candidate entries (making up text for illustration, could use some crafting):
once per Ingest to signal successful creation of new or updated Moab object
note that this level of milestone suffices; it's not a log of, say, receiving bag, verifying bag, writing Moab, etc.
Suggested entry: who="Ingest Workflow", what="Version N ingested"
once for each replica made
again, milestone achieved, not started or failed
Suggested entry: who="Replication something", what="Version N replica saved to endpoint-name/id"
I don't think right now that Inventory checks rise to the level of provenance ... that's internal bookkeeping and we don't need to say every other day "yep, we still haven't lost track of it". Simple inventory reconciliation also may not be provenance fodder ... but if the inventory triggers any more substantial recovery, that recovery will be recorded by the resulting action.
Fixity checks both good and bad should be recorded as provenance. This is the essential bookkeeping for the Trusted Repository; evidence of routine object verification should the repository itself be audited in an external way.
Suggested success entry: who"Fixity audit", what="Version N Moab|endpointname copy verified against TCR"
Suggested failure entry: who"Fixity audit", what="Version N Moab|endpointname copy checksum failure; recovery initiated" which might be follow shortly by the standard "replica made" entry per above if there isn't a more specific call to signal a repair in progress.
This is an operational metadata store, part of the overall Preservation Core Catalog. It contains a datetime-stamped record of events/actions relating to the management of an object in preservation code -- from deposit/ingest of each version, through each replication, every audit check and finding or recovery action taken. As the name implies, this constitutes the object's provenance.
This is described as a component of the Preservation Core Catalog, or PCC. It may make sense as a third database table, or it may exist in some other form, e.g., a structured text file. It is required that this information be maintained per object as a readily and efficiently accessible set of information so that it can be displayed only via Argo or easily attached to objects for preservation or dissemination.
Desired data is a basic set of when, what, who and description, e.g., "Fixit check passed" or "Object X replaced in Local Archive".
Operationally it constitutes an object's history and should be accessible via a Web Service call to authorized users, including Argo for incorporation into its object detail display.
This story does NOT cover related forms of provenance integration or disposition. For instance: there is a provenanceMetadata datastream in objects coming from DOR which contain a small set of event information relating to DOR's management of the objects. In principle it is part of a larger event sequence that covers both DOR and PC custodianship of the object over time. How these are combined for any future export of the object and its complete provenance is not in scope here.
Note the XML schema used for DOR's provenance at https://consul.stanford.edu/display/chimera/Provenance+metadata+--+the+provenanceMetadata+datastream.
If we look just at the event-level information there, we see:
This may suffice for us here, without trying to categorize of codify entries.
Candidate entries (making up text for illustration, could use some crafting):