Closed bschilder closed 8 months ago
Found a treasure trove of data on experimental models for diseases (and perhaps specific phenotypes) on Monarch: https://data.monarchinitiative.org/latest/tsv/model_associations/
However, these files don't include gene-level info (which we would want if we have a particular gene therapy target in mind), but I'm checking to see if there's a way I can extract that from the larger Monarch knowledge graph: https://data.monarchinitiative.org/monarch-kg/latest/
They also only provide MONDO ID's for each disease, so I need to find an effective way to map these back to the HPO/OMIM/DECIPHER/ORPH IDs provided by HPO. I've reached out to the MONDO ontology creators as well:
What's argument against just using Mammalian Phenotype Ontology overlap?
Also, here's some of the messages we sent relating to this previously:
Here's one of the gene's that a mouse model for respiratory failure: http://www.informatics.jax.org/reference/J:120296
Here's the list of mammalian phenotype ontology genes (for respiratory failure): http://www.informatics.jax.org/mp/annotations/MP:0001953 (edited)
Gene therapy for ABCA3 in respiratory failure is already being looked into: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8798122/
What's argument against just using Mammalian Phenotype Ontology overlap?
Several reasons:
Sounds good!
Sent from Outlook for iOShttps://aka.ms/o0ukef
From: Brian M. Schilder @.> Sent: Thursday, November 30, 2023 1:24:56 PM To: neurogenomics/RareDiseasePrioritisation @.> Cc: Skene, Nathan G @.>; Comment @.> Subject: Re: [neurogenomics/RareDiseasePrioritisation] Assess existence of experimental models (Issue #33)
This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
What's argument against just using Mammalian Phenotype Ontology overlap?
Several reasons:
— Reply to this email directly, view it on GitHubhttps://github.com/neurogenomics/RareDiseasePrioritisation/issues/33#issuecomment-1833781209, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPEZIEVU2HSDYDGW4EDLYHCCKRAVCNFSM6AAAAAA7M2TMMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZTG44DCMRQHE. You are receiving this because you commented.Message ID: @.***>
Preliminary summary plot showing the proportion of orthologous genes overlapping between HPO and non-human ontology databases (within a given phenotype), repeated across many phenotypes:
Will include this in the final report as well as showing how we can use this to prioritise gene/phenotype-specific therapeutic targets.
Great, didn’t think about looking at zebrafish models etc as well!
Can you explain the x-axis?
Great, didn’t think about looking at zebrafish models etc as well! Can you explain the x-axis?
Sure!
n_genes_intersect
: for a given phenotype that has a match between a pair of species, count the number of orthologous genes shared between the gene-phenotype annotations of each species.n_genes_hpo
: the total number of unique human genes annotated for a given HPO phenotype.Dividing one over the other thus gives you the proportion of HPO gene annotations recapitulated in the equivalent phenotype of another species.
This proportion will be influenced by both evolutionary distance and how well studied each species is (notice the difference between mouse and rats, despite the fact that they're equally related to humans).
Here's some gene therapy target phenotypes identified by our previous analyses. The exact phenotypes will likely change once we add chatGPT annotations to our filtering strategy with the round of enrichment results. But for now these can serve as an example.
with the heatmap colored by the "equivalence score", which is essentially UPHENO's way of quantifying how well a phenotype matches up across species (on a scale from 0-1). Data comes from here.
Currently the fuzzy equivalence score is the Jaccard similarity:
Not sure exactly on what basis they computed Jaccard similarity, but I'll look into this some more.
upheno_top_targets_heatmap.pdf
Looks like UPHENO has been thinking about adding fly ontology mappings as well, though there hasn't been any activity on this since 2016 it seems. Just pinged them to get an update:
Currently the fuzzy equivalence score is the Jaccard similarity Not sure exactly on what basis they computed Jaccard similarity, but I'll look into this some more.
This HPO publication, in which they did the mapping with Exomiser.
For example, Exomiser (15) leverages the semantic associations between HPO, MP and ZP to prioritize variants effectively by matching human phenotypic abnormalities with phenotypes observed in animal models with knockouts of genes orthologous to human disease-associated genes.
Though this figure suggests there's also already mapping between fly and frog as well. I'll reach out to the HPO team to confirm where i might find this, and to confirm the methodology they used to do the phenotype mapping:
@bschilder would you be up for a quick call on the matter? I will sort you out with fuzzy and proper matches as well.
@bschilder would you be up for a quick call on the matter? I will sort you out with fuzzy and proper matches as well.
Absolutely! Thank you so much for reaching out! Setting up a time for us to meet.
Met with @matentzn who was extremely helpful in explaining the cross-species phenotype matching procedure to me, and pointing me to some additional resources.
For mapping MONDO IDs in the Monarch model's file, I'm switching to using this file as it avoid issues observed here:
With these changes, HPOExplorer
can now map >90% of MONDO ids listed in the model file to OMIM IDs:
library(HPOExplorer)
> model <- get_monarch("disease_to_model")
[100%] Downloaded 883280 bytes...
> model$db <- stringr::str_split(model$subject,":", simplify = TRUE)[,1]
> model <- map_mondo(dat = model,
+ input_col="object",
+ output_col="OMIM_ID",
+ to=c("OMIM","Orphanet"))
[100%] Downloaded 1082741 bytes...
476 / 5,154 (9.24%) OMIM_ID missing.
The only issue is, as far as I can tell MONDO doesn't seem to contain any mappings between MONDO IDs and DECIPHER IDs. DECIPHER IDs only make up a small fraction of the HPO annotations, but would be nice to have a complete mapping nonetheless:
> phenos <- make_phenos_dataframe(add_disease_data = TRUE)
> phenos$disease_db <- stringr::str_split(phenos$disease_id,":", simplify = TRUE)[,1]
> table(phenos$disease_db)
To summarise, the phenotype matching procedure is meant to captured semantic similarity using a semi-heuristic model (a combination of explicit rules and data-driven). Data inputs come from a variety of sources. Ultimately, they linking together concepts (species, diseases, phenotypes, genes, pathways, etc.) in a knowledge graph derived from a mix of NLP queries to the published literature and other database.
@matentzn this is probably a poor attempt to explain this properly, but if there's a paper or docs page you could point me to that would be quite helpful! Thanks!
DECIPHER
We have this for DECIPHER: https://github.com/monarch-initiative/mondo/blob/master/src/ontology/mappings/mondo_hasdbxref_decipher.sssom.tsv
Which will do the job for you!
To summarise, the phenotype matching procedure is meant to captured semantic similarity using a semi-heuristic model (a combination of explicit rules and data-driven). Data inputs come from a variety of sources. Ultimately, they linking together concepts (species, diseases, phenotypes, genes, pathways, etc.) in a knowledge graph derived from a mix of NLP queries to the published literature and other database.
Its simpler than that.
I requested an FBcv profile for you here: https://github.com/monarch-initiative/monarch-semantic-similarity-profiles/issues/16
So you can take a look how it looks like.
DECIPHER
We have this for DECIPHER: https://github.com/monarch-initiative/mondo/blob/master/src/ontology/mappings/mondo_hasdbxref_decipher.sssom.tsv
Which will do the job for you!
Ah, amazing! I had totally missed that bc i was using this file, which I assumed included all the other ones: https://github.com/monarch-initiative/mondo/blob/master/src/ontology/mappings/mondo.sssom.tsv
I've implemented many of these functions within a new package for accessing/processing knowledge graphs in general (HPOExplorer was getting to bloated): https://github.com/neurogenomics/KGExplorer/blob/29eccbbd33fd18d9ce85b0ae72b47d485d97faee/R/map_upheno_data_i.R
I was also just alerted to the monarchr
package, which may extract much of the info i need more efficiently than I am now (which relies mostly on TSV downloads).
I've also begun exploring some of the graph query resources/tools you alerted to me on our call:
To summarise, the phenotype matching procedure is meant to captured semantic similarity using a semi-heuristic model (a combination of explicit rules and data-driven). Data inputs come from a variety of sources. Ultimately, they linking together concepts (species, diseases, phenotypes, genes, pathways, etc.) in a knowledge graph derived from a mix of NLP queries to the published literature and other database.
Its simpler than that.
- We generate phenotypic profiles from ontologies, using jaccard similarity usually over the hierarchical relations in the ontology and information content for the reranking
- Cool Paper: https://www.osti.gov/biblio/1625303 with background
- The current "bestmatches" include a mix of logical and simple lexical matches and are hugely out of date (I would not use them in production, but they are probably "not wrong"
Ahhh, this makes so much more sense now! Thanks for explaining that in more detail, and for the paper (super interesting work!). Along those lines, I've found the rphenoscape
package useful for computing cross-ontology similarity matrices on the go.
I requested an FBcv profile for you here: monarch-initiative/monarch-semantic-similarity-profiles#16
So you can take a look how it looks like.
Thank you so much! I really appreciate this, and all your other help.
Assess whether there is an existing experimental model for each candidate therapeutics target.
We can check this by seeing if there is an MPO or UPHENO annotation for the same phenotype.