Open mbrush opened 7 years ago
Hi Matt,
this is an interesting case--I would say that the answer depends really on the intended use cases.
Also, I do not think that "locomotion" is a good Entity class for epilepsy (case 3). As a side topic, it would be good to figure out how to best model phenotypes like epilepsy!
./Maybe this is a good topic for a future dipper call?
-Peter
Peter Robinson
Professor of Computational Biology
The Jackson Laboratory for Genomic Medicine
10 Discovery Drive
Farmington, CT 06032
860.837.2095 t | 860.990.3130 m
peter.robinson@jax.orgmailto:peter.robinson@jax.org
www.jax.org
Robinson lab: https://robinsongroup.github.io/
The Jackson Laboratory: Leading the search for tomorrow's cures
From: Matthew Brush notifications@github.com Sent: Thursday, August 17, 2017 8:20 PM To: monarch-initiative/dipper Cc: Subscribed Subject: [monarch-initiative/dipper] Reconsider creation of direct G-P and G-D triples (#507)
This issues follows from considerations for modeling disease model associations in #506https://github.com/monarch-initiative/dipper/issues/506, and discussions with ZFIN and AGR on this topic. It concerns the broader issue that in Monarch data we create direct G2P triples between the S and O of all OBAN associations. This can be misleading in cases where the phenotype/disease is dependent on some environmental exposure or behavior.
For example, consider a disease model association for a wild-type 'AB' zebrafish where it is exposure to a drug (pentetrazol) that 'generates' an epilepsy model. With our current approach for disease model associations, we create a triple asserting that the wild-type AB genotype is a model of epilepsy (Figure 1).
Figure 1: [image]https://user-images.githubusercontent.com/5184212/29438994-68aa35e0-836f-11e7-99fe-c302cab82567.png
This direct triple is misleading. We either need to consider not creating this direct triple in our data, and traversing the association node to link the genotype to the disease (and any relevant environmental exposure). Or we should consider an approach using a different subject for the association (see Figure 2)
For example, Figure 2A shows a granular/instance-based approach (like we are using for things like variant-drug sensitivity associationshttps://github.com/monarch-initiative/dipper/issues/463#issuecomment-307230358. Here we would create a zebrafish individual (instance of NCBITaxon:7955) that has_genotype an AB wild-type genotype, and participates_in a pentetrazol exposure. This 'contextualized' fish could then be directly linked to disease it is a model for.
Alternatively, Figure 2B shows an approach ZFIN has proposed that uses an G+E entity as the subject of the is_model_of association. This G+E entity (which it calls FISH-ENVIRONMENT) that combines the fish genotype and the exposure/environment into a single entity. The resulting model is isomorphic to the first proposal, but holds different semantics.
Figure 2: [image]https://user-images.githubusercontent.com/5184212/29439024-9c511508-836f-11e7-9387-b389a586df92.png
Of course, updating our model in such ways would mean we have to update our cypher queries, which leverage the direct G-P and G-D triple that accompany these OBAN associations. But doing this could also push us to start displaying qualifying environment information for these associations in the Monarch app, where this info exists.
Finally, Figure 3 shows what it might look like to use a zebrafish individual as the subject for both phenotype and disease associations - using the AB-pentetrazol example described above, and described in the zfin disease model record herehttps://zfin.org/action/ontology/term-detail/DOID:1826 and phenotype record herehttps://zfin.org/ZDB-FIG-151211-8#phenoDetail
Figure 3: [image]https://user-images.githubusercontent.com/5184212/29439036-b62932e4-836f-11e7-9041-2b7c903165b1.png
Graphical Notation and Prefix Expansions
[image]https://user-images.githubusercontent.com/5184212/29439093-4b86502e-8370-11e7-97fd-d8fec931c542.png
The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
Hi Peter. To clarify re: locomotion and epilepsy - Figure 3 represents two separate statements found in the ZFIN database: one that this fish is asserted to be a model of epilepsy, and one that the fish exhibits an increased locomotion phenotype. But these are independent statements in ZFIN (i.e. it is not asserted that increased locomotion is a phenotype of epilepsy).
locomotion refers to moving ("motion") from place to place ("locus"). It is not clear to me that abnormally increased locomotion is, but logically speaking it would be a fish like Nemo who swims a long distance. But it doesnt seem to be a good way of referring to an abnormal phenotype. Is there a publication that describes this model?
Pub referenced in the record for these records in ZFIN: https://www.ncbi.nlm.nih.gov/pubmed/26135914 "Bioactive C21 Steroidal Glycosides from the Roots of Cynanchum otophyllum That Suppress the Seizure-like Locomotor Activity of Zebrafish Caused by Pentylenetetrazole."
This issues follows from considerations for modeling disease model associations in #506, and discussions with ZFIN and AGR on this topic. It concerns the broader issue that in Monarch data we create direct G2P triples between the S and O of all OBAN associations. This can be misleading in cases where the phenotype/disease is dependent on some environmental exposure or behavior.
For example, consider a disease model association for a wild-type 'AB' zebrafish where it is exposure to a drug (pentetrazol) that 'generates' an epilepsy model. With our current approach for disease model associations, we create a triple asserting that the wild-type AB genotype is a model of epilepsy (Figure 1).
Figure 1:
This direct triple is misleading. We either need to consider not creating this direct triple in our data, and traversing the association node to link the genotype to the disease (and any relevant environmental exposure). Or we should consider an approach using a different subject for the association (see Figure 2)
For example, Figure 2A shows a granular/instance-based approach (like we are using for things like variant-drug sensitivity associations. Here we would create a zebrafish individual (instance of NCBITaxon:7955) that has_genotype an AB wild-type genotype, and participates_in a pentetrazol exposure. This 'contextualized' fish could then be directly linked to disease it is a model for.
Alternatively, Figure 2B shows an approach ZFIN has proposed that uses an G+E entity as the subject of the is_model_of association. This G+E entity (which it calls FISH-ENVIRONMENT) that combines the fish genotype and the exposure/environment into a single entity. The resulting model is isomorphic to the first proposal, but holds different semantics.
Figure 2:
Of course, updating our model in such ways would mean we have to update our cypher queries, which leverage the direct G-P and G-D triple that accompany these OBAN associations. But doing this could also push us to start displaying qualifying environment information for these associations in the Monarch app, where this info exists.
Finally, Figure 3 shows what it might look like to use a zebrafish individual as the subject for both phenotype and disease associations - using the AB-pentetrazol example described above, and described in the zfin disease model record here and phenotype record here
Figure 3:
Graphical Notation and Prefix Expansions