monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

Reconsider creation of direct G-P and G-D triples #507

Open mbrush opened 7 years ago

mbrush commented 7 years ago

This issues follows from considerations for modeling disease model associations in #506, and discussions with ZFIN and AGR on this topic. It concerns the broader issue that in Monarch data we create direct G2P triples between the S and O of all OBAN associations. This can be misleading in cases where the phenotype/disease is dependent on some environmental exposure or behavior.

For example, consider a disease model association for a wild-type 'AB' zebrafish where it is exposure to a drug (pentetrazol) that 'generates' an epilepsy model. With our current approach for disease model associations, we create a triple asserting that the wild-type AB genotype is a model of epilepsy (Figure 1).

Figure 1: image

This direct triple is misleading. We either need to consider not creating this direct triple in our data, and traversing the association node to link the genotype to the disease (and any relevant environmental exposure). Or we should consider an approach using a different subject for the association (see Figure 2)

For example, Figure 2A shows a granular/instance-based approach (like we are using for things like variant-drug sensitivity associations. Here we would create a zebrafish individual (instance of NCBITaxon:7955) that has_genotype an AB wild-type genotype, and participates_in a pentetrazol exposure. This 'contextualized' fish could then be directly linked to disease it is a model for.

Alternatively, Figure 2B shows an approach ZFIN has proposed that uses an G+E entity as the subject of the is_model_of association. This G+E entity (which it calls FISH-ENVIRONMENT) that combines the fish genotype and the exposure/environment into a single entity. The resulting model is isomorphic to the first proposal, but holds different semantics.

Figure 2: image

Of course, updating our model in such ways would mean we have to update our cypher queries, which leverage the direct G-P and G-D triple that accompany these OBAN associations. But doing this could also push us to start displaying qualifying environment information for these associations in the Monarch app, where this info exists.

Finally, Figure 3 shows what it might look like to use a zebrafish individual as the subject for both phenotype and disease associations - using the AB-pentetrazol example described above, and described in the zfin disease model record here and phenotype record here

Figure 3: image


Graphical Notation and Prefix Expansions

image

pnrobinson commented 7 years ago

Hi Matt,

this is an interesting case--I would say that the answer depends really on the intended use cases.

Also, I do not think that "locomotion" is a good Entity class for epilepsy (case 3). As a side topic, it would be good to figure out how to best model phenotypes like epilepsy!

./Maybe this is a good topic for a future dipper call?

-Peter

Peter Robinson

Professor of Computational Biology

The Jackson Laboratory for Genomic Medicine

10 Discovery Drive

Farmington, CT 06032

860.837.2095 t | 860.990.3130 m

peter.robinson@jax.orgmailto:peter.robinson@jax.org

www.jax.org

Robinson lab: https://robinsongroup.github.io/

The Jackson Laboratory: Leading the search for tomorrow's cures


From: Matthew Brush notifications@github.com Sent: Thursday, August 17, 2017 8:20 PM To: monarch-initiative/dipper Cc: Subscribed Subject: [monarch-initiative/dipper] Reconsider creation of direct G-P and G-D triples (#507)

This issues follows from considerations for modeling disease model associations in #506https://github.com/monarch-initiative/dipper/issues/506, and discussions with ZFIN and AGR on this topic. It concerns the broader issue that in Monarch data we create direct G2P triples between the S and O of all OBAN associations. This can be misleading in cases where the phenotype/disease is dependent on some environmental exposure or behavior.

For example, consider a disease model association for a wild-type 'AB' zebrafish where it is exposure to a drug (pentetrazol) that 'generates' an epilepsy model. With our current approach for disease model associations, we create a triple asserting that the wild-type AB genotype is a model of epilepsy (Figure 1).

Figure 1: [image]https://user-images.githubusercontent.com/5184212/29438994-68aa35e0-836f-11e7-99fe-c302cab82567.png

This direct triple is misleading. We either need to consider not creating this direct triple in our data, and traversing the association node to link the genotype to the disease (and any relevant environmental exposure). Or we should consider an approach using a different subject for the association (see Figure 2)

For example, Figure 2A shows a granular/instance-based approach (like we are using for things like variant-drug sensitivity associationshttps://github.com/monarch-initiative/dipper/issues/463#issuecomment-307230358. Here we would create a zebrafish individual (instance of NCBITaxon:7955) that has_genotype an AB wild-type genotype, and participates_in a pentetrazol exposure. This 'contextualized' fish could then be directly linked to disease it is a model for.

Alternatively, Figure 2B shows an approach ZFIN has proposed that uses an G+E entity as the subject of the is_model_of association. This G+E entity (which it calls FISH-ENVIRONMENT) that combines the fish genotype and the exposure/environment into a single entity. The resulting model is isomorphic to the first proposal, but holds different semantics.

Figure 2: [image]https://user-images.githubusercontent.com/5184212/29439024-9c511508-836f-11e7-9387-b389a586df92.png

Of course, updating our model in such ways would mean we have to update our cypher queries, which leverage the direct G-P and G-D triple that accompany these OBAN associations. But doing this could also push us to start displaying qualifying environment information for these associations in the Monarch app, where this info exists.

Finally, Figure 3 shows what it might look like to use a zebrafish individual as the subject for both phenotype and disease associations - using the AB-pentetrazol example described above, and described in the zfin disease model record herehttps://zfin.org/action/ontology/term-detail/DOID:1826 and phenotype record herehttps://zfin.org/ZDB-FIG-151211-8#phenoDetail

Figure 3: [image]https://user-images.githubusercontent.com/5184212/29439036-b62932e4-836f-11e7-9041-2b7c903165b1.png


Graphical Notation and Prefix Expansions

[image]https://user-images.githubusercontent.com/5184212/29439093-4b86502e-8370-11e7-97fd-d8fec931c542.png

- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/monarch-initiative/dipper/issues/507, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEtuPAzbLatVN1gEy1_1arH8IJ5lkHFuks5sZNi3gaJpZM4O69lI.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

mbrush commented 7 years ago

Hi Peter. To clarify re: locomotion and epilepsy - Figure 3 represents two separate statements found in the ZFIN database: one that this fish is asserted to be a model of epilepsy, and one that the fish exhibits an increased locomotion phenotype. But these are independent statements in ZFIN (i.e. it is not asserted that increased locomotion is a phenotype of epilepsy).

pnrobinson commented 7 years ago

locomotion refers to moving ("motion") from place to place ("locus"). It is not clear to me that abnormally increased locomotion is, but logically speaking it would be a fish like Nemo who swims a long distance. But it doesnt seem to be a good way of referring to an abnormal phenotype. Is there a publication that describes this model?

mbrush commented 7 years ago

Pub referenced in the record for these records in ZFIN: https://www.ncbi.nlm.nih.gov/pubmed/26135914 "Bioactive C21 Steroidal Glycosides from the Roots of Cynanchum otophyllum That Suppress the Seizure-like Locomotor Activity of Zebrafish Caused by Pentylenetetrazole."