monarch-initiative / SEPIO-ontology

Ontology for representing scientific evidence and provenance information
49 stars 10 forks source link

Review and refactor modeling of ClinGen 'Data' objects #11

Open mbrush opened 7 years ago

mbrush commented 7 years ago

The ClinGen model currently describes ~ 20 types of 'Data' objects that are used to describe information curated from a specific source and used as evidence supporting CriterionAssessments. The semantics of these are informally described using large number of ad hoc properties to organize terms or data values under a given Data object. One issue for SEPIO alignment is that the ontological nature of these objects in the context of SEPIO is not clear. Specifically, we would want to characterize them as being either:

(1) 'Assertions' in cases where the object simple conveys a statement of purported fact , in the absence of more foundational evidence information supporting this statement; or (2) 'StudyFindings' in cases where the object represents the outcome of a specific study that is directly relevant to the validity of the target assertion (and often captures data/metadata from this study)

Making such a distinction would facilitate extension of the model to describe evidence for Data objects that map to Assertions, through addition of evidence lines that organize the underlying evidence for these claims.

We have started a google doc here that evaluates each 'Data' type to determine whether it would best be represented as an Assertion, or as more foundational Study Data.

mbrush commented 7 years ago

Longer term we can explore refactoring of the structure of Assertions to make them more consistent and aligned with semi-formalized approaches to capturing assertion semantics, such as OBAN and SEPIO.

Longer term we can also explore refactoring the structure of the more foundational study data objects to make these more consistent with each other, and re-use approaches and patterns from existing data standards.