monarch-initiative / SEPIO-ontology

Ontology for representing scientific evidence and provenance information
49 stars 10 forks source link

Use of 'Activities' or 'Contributions' to describe and organize Entity provenance #10

Open mbrush opened 7 years ago

mbrush commented 7 years ago

Many efforts like ClinGen will have use cases that would benefit from explicitly describing the activities through which Entities in our model (Assertions, Supporting Information, Evidence Lines) are created. ClinGen's main use case here is to organize provenance information into discrete, traceable objects for a variety of reasons related to the applications they must support, as detailed here.

I imagine other, similar use cases coming form other groups wanting to model evidence and provenance information in fine detail. With this in mind, we should consider possible approaches for allowing for richer description of provenance of Entities in a way that is compatible with and easily harmonized with more compact representations that will support most use cases (where Entities are directly linked to agents who contributed to them and dates of these contributions, using PAV-like relations).

We are currently exploring the creation and use of 'Contribution' objects - essentially representing reified contribution relationships between an agent and an Entity. This is similar in principle to the PROV notion of an 'Attribution' - but extended to allow time stamps and roles to be added in this context.

Alternatively, we could allow for a minimal representation of Activities, whose use is limited to describing provenance of entities, but which would be used to create activity-based paths through the data in the style of PROV (which would describe a VarianntInterpretation as a series of Activities with inputs and outputs and agents).

Diagrams of proposed patterns based on these approaches can be found in the cmap here.

mbrush commented 7 years ago

If alternate patterns for describing the same basic content are defined by SEPIO and allowed in practice, we will have to provide some means for post-hoc harmonization (e.g. defining property chains to materialize shortcut relations across the richer model, or pre-defined scripts for programmatic post-processing).

mbrush commented 7 years ago

On 3-21-17 call, we decided that the v1 ClinGen model would implement the reified Contribution based approach.