How is "correlated disease" used in Monarch?

ValWood commented 5 months ago

Please describe your question, suggestion, or concern.

How is "correlated disease" used in Monarch? I can't see where this is defined.

I had assumed it was used when a variant is correlated with a disease, but not known to be causal. However, I see cases where it is used when the gene is causal (but not always, it's a susceptibility, for example

ATG16L1 | and inflammatory bowel disease 10

I guess my question is "correlated disease" A) always used for susceptibility (i.e with other environmental conditions), or polygenic contributions. OR B) Would it ever be used for disease candidates (genetic correlation, ie. via linkage disequilibrium?)

thanks,

Val

If your question or suggestion is specific to Mondo, please submit it here instead: https://github.com/monarch-initiative/mondo/issues

ValWood commented 5 months ago

One reason I ask is because I see susceptibilities that are listed as correlated (i.e. POLD) https://monarchinitiative.org/MONDO:0012953

and susceptibilities that are listed as causal i.e POT1 https://monarchinitiative.org/MONDO:0014368

sagehrke commented 5 months ago

Hi @ValWood - Thanks for submitting this question!

@kevinschaper @cmungall @monicacecilia: Can you help Val out here? Thank you!

nlharris commented 5 months ago

@kevinschaper @cmungall @monicacecilia I have a blog post ready to go about PomBase using Mondo, but can't post it until someone answers Val's question.

amc-corey-cox commented 5 months ago

@nlharris I'm working on this right now. I'm just digging in but I'll try to get this for you as soon as I can.

kevinschaper commented 5 months ago

@amc-corey-cox my memory is that I made a biolink model PR to add causal and correlated gene categories to match the labels shown in the old UI, which is a pretty unsatisfying answer.

amc-corey-cox commented 5 months ago

Thanks for that information Kevin. I'll look up that PR.

amc-corey-cox commented 5 months ago

Okay, here is the start of an explanation on how we use correlated_disease.

We have biolink:CausalGeneToDiseaseAssociation and biolink:CorrelatedGeneToDiseaseAssociation as subclasses of biolink:GeneToDiseaseAssociation. I believe this means when there is evidence of a direct causal role of the gene, such as Mendelian heritability, for the disease we use the term biolink:CausalGeneToDiseaseAssociation. Any other association that links the gene to causally to a disease, such as polygenic or susceptibility, would be biolink:CorrelatedGeneToDiseaseAssociation.

Other associations that don't necessarily imply any form of causation would be simply biolink:GeneToDiseaseAssociation.

This is my current hypothesis of the explanation. I want to see if I can find these in the actual ingests to see what evidence we're using to create these edges in order to validate the above.

The biolink model also has these descriptions: biolink:GeneToDiseaseAssociation: gene in which variation is correlated with the disease, may be protective or causative or associative, or as a model biolink:CausalGeneToDiseaseAssociation: gene in which variation is shown to cause the disease. biolink:CorrelatedGeneToDiseaseAssociation: gene in which variation is shown to correlate with the disease.

Does this seem reasonable or is there something obviously wrong that I've done?

amc-corey-cox commented 5 months ago

Okay, I think I have validation of the above. Here we discuss the terms Correlated or Causal gene to disease association. https://monarch-initiative.github.io/monarch-ingest/Sources/hpoa/#gene-to-disease

The associations are derived from these fields: MENDELIAN: biolink:causes POLYGENIC: biolink:contributes_to UNKNOWN: biolink:gene_associated_with_condition

This appears to mesh with my statements above. So, final answer to this question. We intend for 'correlated disease' to be used when a gene to disease association indicates some contribution to causing the disease condition but not including strict Mendelian association, for which we use the term 'causal'. It is possible that we've made a mistake in how these are derived and if so please bring this to our attention. However, I believe this should be correct based on what we are seeing with ATG16L1. Further in answer to the question of "genetic correlation, ie. via linkage disequilibrium", I believe we intend to use biolink:GeneToDiseaseAssociation for these broader correlations. Again, please let us know if this appears to be inconsistent.

ValWood commented 5 months ago

This makes sense, so contributes_to should be polygenic (except I think many causal genes are classed as correlated. I can provide a partial list).

The POLD1 problem above would be resolved by adding the terms for the germ-line mutation diseases https://github.com/monarch-initiative/mondo/issues/7845 (as the current term does not differentiate between germ-line and sporadic)

There are quite a lot of inconsistencies. For example colorectal cancer, susceptibility to, 12 (MONDO:0014038) is_a hereditary neoplastic syndrome but this has contributes_to however this is a single gene inherited disorder

Some of the issues are probably caused by conflating a heritable causal gene which increases susceptibility with a susceptibility that is presumed to increase incrementally by variants in multiple genes.

==

It also seems strange for correlated genes to have definitions of the form: Any type 2 diabetes mellitus in which the cause of the disease is a mutation in the TBC1D4 gene. because for polygenic disorders, the gene isn't causal?

ValWood commented 5 months ago

It would also be useful to have precise definitions on the Monarch website so that we could link to them. tks v

ValWood commented 5 months ago

I guess for this it is OK colorectal cancer, susceptibility to, 12 (MONDO:0014038) because for any cancer subsequent changes are required....

amc-corey-cox commented 5 months ago

This is great feedback @ValWood. Unfortunately, if the data we're ingesting has these marked inconsistently we will as well. However, we should also make sure we're ingesting them correctly. I'll discuss with my team how we should move forward with this.

ValWood commented 5 months ago

It is probably not a huge issue but it would be useful to be precise about the meaning of the qualifiers. I still don't fully understand.

My main issue is describing genes "contributes_to" flagged as contributes to as "causal" for a disease in the ontology definitions. That seems to be misleading. And seems to be a Mondo issue rather than an ingest issue.

I was chatting to PomBase team about this in our group meeting, and we wondered why you need a qualifier AND "susceptibility to" in the term label. We wondered why the information could not be captured in the ontology rather than with a qualifier (because people frequently ignore qualifiers)

monicacecilia commented 1 month ago

@sabrinatoro 👀 👆

sabrinatoro commented 1 month ago

I think the main problem here is with the "susceptibility" terms. These "susceptibility" terms come from OMIM, and are therefore added into Mondo. However, the data we get from the different sources more often relate to a disease and not necessarily to a "disease susceptibility"

It is therefore correct that we have different ways to represent "susceptibility" concepts in Mondo/Monarch and their causal/correlated gene: 1) "susceptibility to disease X" (in Mondo) - caused by a variation in gene X 2) "disease X" - correlated with gene X (because a variation in gene X confers a susceptibility to getting the disease.)

We need to review the representation of disease susceptibility in both Mondo and Monarch. (@monicacecilia I don't know where this falls on the priority list for both these projects. Let's discuss)

monarch-initiative / helpdesk

How is "correlated disease" used in Monarch? #128