monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
56 stars 26 forks source link

Diseases which are also Genes #961

Closed rgb4268 closed 3 years ago

rgb4268 commented 4 years ago

I have found some instances of diseases, e.g. MONDO:0020119 which have a child which is actually a gene - in this case ZC4H2. There are a few more.

nicolevasilevsky commented 4 years ago

@kshefchek did you say this is actually an issue with Monarch, not Mondo?

kshefchek commented 4 years ago

I suspect it's an integration bug with merging Ids from multiple sources, I can take a closer look

kshefchek commented 4 years ago

There are a few more.

@rgb4268 could you list these here? I can only find this one case.

TomConlin commented 3 years ago

We do have the data structures on hand to "trust but verify" HGNC's assertion that their OMIM dbxref is in fact to what we also believe is a gene

kshefchek commented 3 years ago

false alarm on HGNC, still trying to trace this one

kshefchek commented 3 years ago

closer look, this looks like an NCBIGene issue: https://www.ncbi.nlm.nih.gov/gene/55906 owl:equivalentClass http://omim.org/entry/309605

and this is equivalent to MONDO:0010666

looks like we need to extend the omim type checking down to this line: https://github.com/monarch-initiative/dipper/blob/master/dipper/sources/NCBIGene.py#L446-L466

TomConlin commented 3 years ago

A bit more context omim:309605 has been subsumed into
omim:314580 which is for WIEACKER-WOLFF SYNDROME but the original may have had a different role when it was assigned to NCBIGene:55906

Dipper does handle the replacement & classifies it as a disease

Also we no longer create an equivalence class as it no longer falls through with continue

pnrobinson commented 3 years ago

Note also this entry https://omim.org/entry/301041 I think it is confusing to separate the diseases into XLR and XLD as is being done here

TomConlin commented 3 years ago

close w/ #964