monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
15 stars 1 forks source link

diseases and causal associations in OMIM missing from MONDO/Monarch #429

Closed ValWood closed 1 year ago

ValWood commented 2 years ago

Diseases have MONDO IDs, but associations are missing from Monarch

~SPCC1281.06c ole1 .0001 DEAFNESS, AUTOSOMAL DOMINANT 79 (1 family) (SCD5) MONDO:0033668Z~ ~SPBC211.03c gea1 .0004 CHARCOT-MARIE-TOOTH DISEASE, AXONAL, TYPE 2GG (GBF1) MONDO:0011675~ ~SPBC25H2.06c hrf1 .0004 KAYA-BARAKAT-MASSON SYNDROME (YIF1B) MONDO:0030878~ ~SPAC57A10.03 cyp1 .0005 PONTOCEREBELLAR HYPOPLASIA, TYPE 14 (PPIL1) MONDO:0030258~ ~SPBC1773.10c nrs1 .0006 NEURODEVELOPMENTAL DISORDER WITH MICROCEPHALY, IMPAIRED LANGUAGE, AND GAIT ABNORMALITIES (NARS1) MONDO:0100348~ ~SPAC3G9.06 frs2 .0002 RAJAB INTERSTITIAL LUNG DISEASE WITH BRAIN CALCIFICATIONS 2 (1 patient) (FARSA) MONDO:0100220 SPBC4C3.06 syp1 .0006 IMMUNODEFICIENCY 76 (FCHO1) MONDO:0030898~ ~SPAC30C2.07 fnp1 .0006 IMMUNODEFICIENCY 93 AND HYPERTROPHIC CARDIOMYOPATHY (FNIP1) MONDO:0030528~ ~SPAC637.12c mst1 .0003 NEURODEVELOPMENTAL DISORDER WITH DYSMORPHIC FACIES, SLEEP DISTURBANCE, AND BRAIN ABNORMALITIES (KAT5) MONDO:0030852~ ~SPBC1347.10 cdc23 .0004 IMMUNODEFICIENCY 80 WITH CONGENITAL CARDIOMYOPATHY (MCM10) MONDO:0030266 SPCC1753.03c rec7 .0002 OOCYTE MATURATION DEFECT 10 (REC114) MONDO:0030925~ SPCC132.01c mtr1 .0007 INTELLECTUAL DEVELOPMENTAL DISORDER WITH SPEECH DELAY AND AXONAL PERIPHERAL NEUROPATHY (NEMF) MONDO:0030849 ~SPAC144.08 jac1 .0002 ANEMIA, SIDEROBLASTIC, 5 (1 patient) (HSCB) MONDO:0030436 SPAC1486.08 cox16 .0001 MITOCHONDRIAL COMPLEX IV DEFICIENCY, NUCLEAR TYPE 22 (COX16) MONDO:0032626 SPBC713.02c ubp15 .0005 HAO-FOUNTAIN SYNDROME (USP7) MONDO:0014805 SPBC4.05 mlo2 .0001 LI-CAMPEAU SYNDROME (UBR7) MONDO:0030963~ ~SPCC1739.03 hrr1 .0001 IMMUNODEFICIENCY 91 AND HYPERINFLAMMATION (ZNFX1) MONDO:0030491 SPBC776.17 rrp7 .0001 MICROCEPHALY 28, PRIMARY, AUTOSOMAL RECESSIVE (1 family) (RRP7) MONDO:0030339 SPAC1F3.07c rsc58 .0001 ENDOVE SYNDROME, LIMB-BRAIN TYPE (1 patient) (EN1) MONDO:0030979 SPAC1834.11c sec18 .0002 DEVELOPMENTAL AND EPILEPTIC ENCEPHALOPATHY 96 (NSF) MONDO:0023659~ ~SPAC6B12.18 gon7 .0002 GALLOWAY-MOWAT SYNDROME 9 (GON7) MONDO:0030471 SPCC895.03c sua5 .0001 GALLOWAY-MOWAT SYNDROME 10 (YRDC) MONDO:0030476 SPBC2G5.06c hmt2 .0002 SULFIDE:QUINONE OXIDOREDUCTASE DEFICIENCY (SQOR) MONDO:0030982 SPCC663.11 saf1 .0003 VERTEBRAL, CARDIAC, TRACHEOESOPHAGEAL, RENAL, AND LIMB DEFECTS (WBP11) MONDO:0030987 SPBC6B1.10 prp17 .0001 PONTOCEREBELLAR HYPOPLASIA, TYPE 15 (1 patient) (CDDC40) MONDO:0030259~ SPAC17A5.02c dbr1 .0004 ENCEPHALITIS, ACUTE, INFECTION-INDUCED (VIRAL), SUSCEPTIBILITY TO, 11 (DBR1) MONDO:0030334 ~SPAC3H1.02c sdd3 .0002 SPINOCEREBELLAR ATAXIA, AUTOSOMAL RECESSIVE 30 (PITRM1) MONDO:0030318 SPCC61.04c .0001 MICROCEPHALY, EPILEPSY, AND DIABETES SYNDROME 2 (YIPF5) MONDO:0025690 SPCC16C4.12 naa20 .0002 INTELLECTUAL DEVELOPMENTAL DISORDER, AUTOSOMAL RECESSIVE 73 (NAA20) MONDO:0030533~

Have no MONDO term ~SPAC17H9.10c ddb1 .0005 WHITE-KERNOHAN SYNDROME (DDB1) NO_MONDO_TERM~ ~SPCP1E11.06 apl4 --.0005 USMANI-RIAZUDDIN SYNDROME, AUTOSOMAL RECESSIVE (AP1G1) --.0003 USMANI-RIAZUDDIN SYNDROME, AUTOSOMAL DOMINANT (AP1G1) NO_MONDO_TERM~ SPCC16C4.02c .0002 NEURODEVELOPMENTAL DISORDER WITH INFANTILE EPILEPTIC SPASMS (NCDN) NO_MONDO_TERM ~SPBC115.01c rrp46 .0005 CEREBELLAR ATAXIA, BRAIN ABNORMALITIES, AND CARDIAC CONDUCTION DEFECTS (EXOSC5) NO_MONDO_TERM SPMIT.10 atp9 .0001 DYSTONIA, EARLY-ONSET, AND/OR SPASTIC PARAPLEGIA (ATP5MC3) NO_MONDO_TERM~ SPAC17C9.05c pmc3 .0007 NEURODEVELOPMENTAL DISORDER WITH SPASTICITY, CATARACTS, AND CEREBELLAR HYPOPLASIA (MED27) NO_MONDO_TERM ~SPCC622.11 .0002 DEVELOPMENTAL DELAY WITH VARIABLE NEUROLOGIC AND BRAIN ABNORMALITIES (LMBRD2) NO_MONDO_TERM~ ~SPBP8B7.19 spt16 .0003 NEURODEVELOPMENTAL DISORDER WITH DYSMORPHIC FACIES AND THIN CORPUS CALLOSUM (SUPT16H) NO_MONDO_TERM~ ~SPBC336.10c tif512 .0004 FAUNDES-BANKA SYNDROME (EIF5A) NO_MONDO_TERM~ ~SPAC1F5.06 lsh1 .0002 IMMUNODEFICIENCY 59 AND HYPOGLYCEMIA (HYOU1) NO_MONDO_TERM~ ~SPAC824.05 vps16 .0005 DYSTONIA 3 (VPS16) NO_MONDO_TERM~

Also SPAC15A10.04c zpr1 **Defects in this gene or the SMN1 gene can cause spinal muscular atrophy MONARCH, but human (ZPR1) has no disease assignment

In each case human ortholog is in parentheses.

https://github.com/pombase/curation/issues/3274

ValWood commented 2 years ago

Another missing association ~Mtx2 .0005 MANDIBULOACRAL DYSPLASIA PROGEROID SYNDROME (MONDO:0030880)~

ValWood commented 2 years ago

For the SPAC15A10.04c zpr1 **Defects in this gene or the SMN1 gene can cause spinal muscular atrophy MONARCH, but human (ZPR1) has no disease assignment

one, this appears to be a disease modifier. This would be a useful tag to include in Mondo. It seems a shame not to capture this connections to "spinal muscular atrophy" but I don't want to mix with the causal.

ValWood commented 2 years ago

~Another ECM10 MONDO:0031011 .0001 NEURODEVELOPMENTAL DISORDER WITH DYSMORPHIC FACIES AND VARIABLE SEIZURES~

sabrinatoro commented 2 years ago

@nicolevasilevsky and @matentzn These diseases have Equivalent to OMIM, but the gene associations were not added to Mondo. This might be an issue at the level of the import with OMIM. Should I update the gene associations manually, or should we use this issue to review the omim import/synchronization pipeline? Please advice. Thank you!

matentzn commented 2 years ago

Are these disease defining genes, or just associated genes? If the latter, aren't they added through the OMIM monarch ingest?

ValWood commented 2 years ago

They are causal for the disease (or at least they seem to be). Most are single-gene diseases.

matentzn commented 2 years ago

@sabrinatoro Do we have a documentation page that describes exactly under which condition a disease-gene relationship is imported? As far as I understand, causal is not enough, I though the gene must be used to define the disease. A mere known causal relation to some gene is (afaik) not enough, only if a variant of the gene is necessarily causing the disease.

sabrinatoro commented 2 years ago

@matentzn I do not have documentation. I remember an issue (I will look for it) where this exact question was asked, but I don't know whether we import the disease-gene relation in mondo and/or whether we have specific conditions to do so (and if we know, what these conditions are). We should discuss this at our next Mondo call.

ValWood commented 2 years ago

@matentzn What do you mean by causal? To me causal means that it is demonstrated that a variation causes the disease?

matentzn commented 2 years ago

I dont really want to get into deep into content discussions :) Sorry for barging in here. But just to make that last point: Any relationship with Some/Some semantics does not belong in Mondo (or any ontology for that matter), but in evidence-based knowledge graphs like Monarch. A some/some relationship reads like this:

"Some instances of Disease X are caused by (some instance of) Gene X."

Ontologies only contain "all/some", which reads:

"All instances of Disease X are caused by Gene X" or "Disease X, whenever it occurs, is always caused by Gene X".

Are the causal relationships you are talking about always of the latter kind?

ValWood commented 2 years ago

As far as I am aware yes, these are all single-gene diseases.

nicolevasilevsky commented 2 years ago

We discussed this on a Mondo call:

We'll use this list from you @ValWood as a reference to make sure our pipeline is working.

related to https://github.com/monarch-initiative/mondo/issues/2343

related: https://github.com/monarch-initiative/mondo-ingest/issues/47 https://github.com/monarch-initiative/mondo-ingest/issues/48

sabrinatoro commented 2 years ago

Note:

sabrinatoro commented 1 year ago

Update discussion on Technical call on 11/11/22.

nicolevasilevsky commented 1 year ago

@ValWood is there any action items needed for this at the moment or can we close this ticket? Thanks!

ValWood commented 1 year ago

Well I just spot-checked and, for example

GBF1 in Monarch is not connected to CHARCOT-MARIE-TOOTH DISEASE, AXONAL, TYPE 2GG (GBF1) MONDO:0011675

https://monarchinitiative.org/disease/MONDO:0011675

but in OMIM it is clearly causal: In 7 patients from 4 unrelated families with dominant axonal Charcot-Marie-Tooth disease-2GG (CMT2GG; 606483), Mendoza-Ferreira et al. (2020) identified 4 different heterozygous mutations in the GBF1 gene (603698.0001-603698.0004).

So I'm not sure that the issues is fixed?

I can check some more later MONDO:0030878 https://monarchinitiative.org/disease/MONDO:0030878 I now get Error: An exception occurred:

nicolevasilevsky commented 1 year ago

@matentzn I am moving this to the mondo ingest repo. But is this more of a Monarch issue and not Mondo issue?

matentzn commented 1 year ago

We wont sync up gene associations for at least a month, but we could prioritise is to see if they show up. But remember, in Mondo we only include definete causal relations, not probablistic ones. These go into the Monarch KG.

kevinschaper commented 1 year ago

Also, the data on monarchinitiative.org is frozen from 2021, so we'll want to check against the new monarch-kg once @RichardBruskiewich's OMIM work is done.

cmungall commented 1 year ago

Status update:

If these remaining issues are fixed is that good enough for now @ValWood?

To clarify, are you using the UI primarily or also downloads? I want to make sure that our downloads are simple and clear

ValWood commented 1 year ago

Normally I would be using the UI to get the MONDO ID for a gene that I know to have a disease association. But I then wanted to use the download file of associations to give Kim instructions how to automate our MONDO assignments via Monarch. I agree the gene association is much too buried.

There is also a lot of white space around the data, and redundancy (biologists seem to hate that, I know it's trendy but if you are looking for information it's really frustrating). For example the repetition of Charcot-Marie-Tooth Disease, axonal, type 2GG at the top of every phenotype. Is that necessary? it's the Charcot-Marie-Tooth Disease, axonal, type 2GG page? Some diseases will have 100's of phenotypes, a user will want to be able to scan as many phenotypes as possible at once......

But good for me, I can get the associations that I know I am missing from the new site. Are the new site downloads available yet?

ValWood commented 1 year ago

All good! I now have all the missing ones.