monarch-initiative / SEPIO-ontology

Ontology for representing scientific evidence and provenance information
49 stars 10 forks source link

Provenance: Agent, Contribution competing approaches #25

Open larrybabb opened 5 years ago

larrybabb commented 5 years ago

In order to have Agent more closely reflect the "Agent" concept from W3C Provenance's model we should consider

  1. a recursive "actedOnBehalfOf" relationship between Agents to allow for agents that are part of a hierarchical organization structure.
  2. the notion of a role that to specify how the agent played the role ( I realize this is in contribution, but I'm unclear how not having it in the direct Agent to Entity or Agent to Activity relationship is useful)
  3. we should segregate "activity" entities from "entity" entities more strictly so as to not conflate "entities" with activity-like information such as date/times, either that or we should strictly define the built-in activity information "created", "modified", "approved/asserted", "published", "curated", etc...

Contribution is a better bundling of Agent and Activity concepts from W3C Prov (IMO) agent, role and date/time of the activity. But it is confusing when similar attributes are also provided on the entities themselves. I like flexibility, but not at the loss of consistency.

For example, Assertion has (stated_by and date_stated) as well as (validated_by and date_validated) built in. EvidenceLine has (evidencestrength assessed_by and date_evidencestrength assessed) built in. EvidenceItem has (date and specifiedBy) built in (not sure if these go together). Only EvidenceItem has a relationship with Activity - is that intentional?

If Assertion and EvidenceLine are to contain the precise provenance information for the agent and date/time of the activities "stating"/"validating" and "strength_assessing", respectively, then why do we have the Contribution entity? Is it for additional kinds of contributions? If so, then it should be clarified. I don't think it is right to have alternative approaches to representing the same information (at least not such significantly structural differences).

I recommend that we standardize how provenance data is structured throughout the model so that we can reduce complexity and confusion for adopters.