monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
56 stars 26 forks source link

Modeling Variant-Treatment-Disease associations #462

Open mbrush opened 7 years ago

mbrush commented 7 years ago

Raising some initial questions and proposals for Dipper modeling of 'predictive' associations between variants and treatment response, as relevant to ingest of cancer variant sources described in Translator #27.

We will likely treat these as ternary relationships between a variant, treatment, and disease (V-T-D). As we have done for G2P associations, we will model using OBAN-style reification of a primary triple, and use 'qualifiers' on the reified association to approximate a ternary relationship.

Consider as an example this claim from the CIViC database, that the BRAF V600E variant correlates with sensitivity to Vemurafenib in cases of colorectal cancer.

The question posed in this ticket concerns the core structure of a V-T-D association (i.e what entities will be in subject, predicate, object, and qualifier slots).

mbrush commented 7 years ago

The 'variant' will fill the 'subject' slot in V-T-D associations - but there are several possibilities for how to model the association object, and association qualifiers. Here, the association object could be:

  1. The Treatment/Drug: This is the most obvious and likely approach imho - as it seems most intuitive and has precedence in Wikidata - e.g. see claims for the BRAF V600E variant. This approach would require the treated disease to be captured as a 'qualifier' on the association. For the CIViC example above (BRAF V600E - Vemurafenib - colorectal cancer), the resulting model might look like this: civic_001 Read as: "BRAF V600E correlates with sensitivity to Vemurafenib treatment of colorectal cancer". The 'correlates_with_sensitivity_to' property used here is only a proposal, and would be roughly equivalent to WD property 'positive therapeutic predictor'.

  1. The Disease/Condition: Swapping the position of the treatment and disease treated, we would get something like: civic_002 Read as: "BRAF V600E correlates with therapeutic sensitivity in colorectal cancer when treated with Vemurafenib". Here, the qualifying treatment could be thought of as an environment/exposure in which the primary variant-condition association holds.

  1. A 'Response to Therapy' Phenotype: This approach is a bit of a stretch, but in a sense these claims are about therapeutic response 'phenotype' that is caused by / correlated with a genetic variant. civic_003 The approach requires two qualifiers to indicate the disease being treated and the treatment applied - resulting in a more verbose model, and more complex queries needed to answer core CQs. But treating these as standard G2P associations may provide benefits in terms of alignment/interoperability with other Monarch G2P data. It could also allow for an approach using LEGO-style composition of more specific therapeutic response phenotypes, if we purse this type of modeling down the road.
mbrush commented 7 years ago

We will move ahead with Approach 1 above for first pass, unless there are concerns. Feedback welcome, esp. from @cmungall @mellybelly and @pnrobinson.

To Do: