monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

Review G2P association design pattern (and identity criteria) #253

Open mbrush opened 8 years ago

mbrush commented 8 years ago

Hoping to review Monarch association design patterns and identity criteria, as some questions are arising as we get into more complex sources such as ClinVar. G2P associations are non-binary, being relationships between G, P, and Environment (and additionally Developmental Stage).

Our Dipper models define a primary association between the G and the P, and hang the Environment and the Stage as 'qualifiers' on the association (red properties in image below).

concepts_010 (Note that the environment and stage models are still under development)

In our model, a given association is _identified_ by G + P + Environment + Stage. Evidence and provenance are NOT identity criteria for a Monarch association (i.e. there can be more than one person asserting a given association instance, and more than one lines of evidence for a single association).

Questions for discussion:

  1. Are we happy with the general design pattern of hanging E and S as 'qualifiers' on the association?
  2. Are we happy with the simple design pattern for capturing developmental stage as a developmental process that start and end at some stage/age category pulled from a relevant vocabulary?
  3. Do the proposed identity criteria make sense and meet our use cases?
  4. The types of stage data should be reviewed in particular, w.r.t when it should speak to the identity of an association. There are times when stage data indicates that a G2P holds only during the indicated stage(s), and other times where stage data that merely indicates the stage of the organism when the phenotype was observed. The former seems like it should be considered identifying criteria for an association, while the latter is perhaps just part of the provenance of the association?
  5. Note that there is a proposal to also hang sex as a qualifier when a G2P is sex-specific (see #255).
  6. The evidence model should be reviewed as well, esp w.r.t alignment with the Noctua/LEGO evidence model.
mbrush commented 8 years ago

Adding and updated diagram that fleshes out the evidence model a little bit (not the full model, but gives a flavor).

assoc_model