Open ValWood opened 5 months ago
One reason I ask is because I see susceptibilities that are listed as correlated (i.e. POLD) https://monarchinitiative.org/MONDO:0012953
and susceptibilities that are listed as causal i.e POT1 https://monarchinitiative.org/MONDO:0014368
Hi @ValWood - Thanks for submitting this question!
@kevinschaper @cmungall @monicacecilia: Can you help Val out here? Thank you!
@kevinschaper @cmungall @monicacecilia I have a blog post ready to go about PomBase using Mondo, but can't post it until someone answers Val's question.
@nlharris I'm working on this right now. I'm just digging in but I'll try to get this for you as soon as I can.
@amc-corey-cox my memory is that I made a biolink model PR to add causal and correlated gene categories to match the labels shown in the old UI, which is a pretty unsatisfying answer.
Thanks for that information Kevin. I'll look up that PR.
Okay, here is the start of an explanation on how we use correlated_disease.
We have biolink:CausalGeneToDiseaseAssociation and biolink:CorrelatedGeneToDiseaseAssociation as subclasses of biolink:GeneToDiseaseAssociation. I believe this means when there is evidence of a direct causal role of the gene, such as Mendelian heritability, for the disease we use the term biolink:CausalGeneToDiseaseAssociation. Any other association that links the gene to causally to a disease, such as polygenic or susceptibility, would be biolink:CorrelatedGeneToDiseaseAssociation.
Other associations that don't necessarily imply any form of causation would be simply biolink:GeneToDiseaseAssociation.
This is my current hypothesis of the explanation. I want to see if I can find these in the actual ingests to see what evidence we're using to create these edges in order to validate the above.
The biolink model also has these descriptions: biolink:GeneToDiseaseAssociation: gene in which variation is correlated with the disease, may be protective or causative or associative, or as a model biolink:CausalGeneToDiseaseAssociation: gene in which variation is shown to cause the disease. biolink:CorrelatedGeneToDiseaseAssociation: gene in which variation is shown to correlate with the disease.
Does this seem reasonable or is there something obviously wrong that I've done?
Okay, I think I have validation of the above. Here we discuss the terms Correlated or Causal gene to disease association. https://monarch-initiative.github.io/monarch-ingest/Sources/hpoa/#gene-to-disease
The associations are derived from these fields: MENDELIAN: biolink:causes POLYGENIC: biolink:contributes_to UNKNOWN: biolink:gene_associated_with_condition
This appears to mesh with my statements above. So, final answer to this question. We intend for 'correlated disease' to be used when a gene to disease association indicates some contribution to causing the disease condition but not including strict Mendelian association, for which we use the term 'causal'. It is possible that we've made a mistake in how these are derived and if so please bring this to our attention. However, I believe this should be correct based on what we are seeing with ATG16L1. Further in answer to the question of "genetic correlation, ie. via linkage disequilibrium", I believe we intend to use biolink:GeneToDiseaseAssociation for these broader correlations. Again, please let us know if this appears to be inconsistent.
This makes sense, so contributes_to should be polygenic (except I think many causal genes are classed as correlated. I can provide a partial list).
The POLD1 problem above would be resolved by adding the terms for the germ-line mutation diseases https://github.com/monarch-initiative/mondo/issues/7845 (as the current term does not differentiate between germ-line and sporadic)
There are quite a lot of inconsistencies. For example colorectal cancer, susceptibility to, 12 (MONDO:0014038) is_a hereditary neoplastic syndrome but this has contributes_to however this is a single gene inherited disorder
Some of the issues are probably caused by conflating a heritable causal gene which increases susceptibility with a susceptibility that is presumed to increase incrementally by variants in multiple genes.
==
It also seems strange for correlated genes to have definitions of the form: Any type 2 diabetes mellitus in which the cause of the disease is a mutation in the TBC1D4 gene. because for polygenic disorders, the gene isn't causal?
It would also be useful to have precise definitions on the Monarch website so that we could link to them. tks v
I guess for this it is OK colorectal cancer, susceptibility to, 12 (MONDO:0014038) because for any cancer subsequent changes are required....
This is great feedback @ValWood. Unfortunately, if the data we're ingesting has these marked inconsistently we will as well. However, we should also make sure we're ingesting them correctly. I'll discuss with my team how we should move forward with this.
It is probably not a huge issue but it would be useful to be precise about the meaning of the qualifiers. I still don't fully understand.
My main issue is describing genes "contributes_to" flagged as contributes to as "causal" for a disease in the ontology definitions. That seems to be misleading. And seems to be a Mondo issue rather than an ingest issue.
I was chatting to PomBase team about this in our group meeting, and we wondered why you need a qualifier AND "susceptibility to" in the term label. We wondered why the information could not be captured in the ontology rather than with a qualifier (because people frequently ignore qualifiers)
@sabrinatoro 👀 👆
I think the main problem here is with the "susceptibility" terms. These "susceptibility" terms come from OMIM, and are therefore added into Mondo. However, the data we get from the different sources more often relate to a disease and not necessarily to a "disease susceptibility"
It is therefore correct that we have different ways to represent "susceptibility" concepts in Mondo/Monarch and their causal/correlated gene: 1) "susceptibility to disease X" (in Mondo) - caused by a variation in gene X 2) "disease X" - correlated with gene X (because a variation in gene X confers a susceptibility to getting the disease.)
We need to review the representation of disease susceptibility in both Mondo and Monarch. (@monicacecilia I don't know where this falls on the priority list for both these projects. Let's discuss)
Please describe your question, suggestion, or concern.
How is "correlated disease" used in Monarch? I can't see where this is defined.
I had assumed it was used when a variant is correlated with a disease, but not known to be causal. However, I see cases where it is used when the gene is causal (but not always, it's a susceptibility, for example
ATG16L1 | and inflammatory bowel disease 10
I guess my question is "correlated disease" A) always used for susceptibility (i.e with other environmental conditions), or polygenic contributions. OR B) Would it ever be used for disease candidates (genetic correlation, ie. via linkage disequilibrium?)
thanks,
Val
If your question or suggestion is specific to Mondo, please submit it here instead: https://github.com/monarch-initiative/mondo/issues