monarch-initiative / omim

Data ingest pipeline for OMIM.
7 stars 3 forks source link

`morbidmap.txt`: `Phenotype`s missing MIM#s #78

Closed joeflack4 closed 1 year ago

joeflack4 commented 2 years ago

Overview

We are mapping gene::disease associations from morbidmap.txt. All of the genes have MIM#s (see the MIM Number field). The MIM# for the 'diseases' are found within the string label of the Phenotype field. However, in some cases there is no MIM# there.

Possible solutions.

At the meeting on 2022/11/11, we opted for (b). (a) is just something I thought up and not a fully formed idea.

Questions

  1. For possible solution (b), should I remove the mapping key from the end? E.g. ?Thrombophilia 9 due to decreased release of tissue plasminogen (1) --> ?Thrombophilia 9 due to decreased release of tissue plasminogen. I am assuming yes for now.

Examples

All but the last two rows here are examples where there is no MIM# for Phenotype. I included the last two rows just in case it helps to compare to the thyroid carcinoma rows that have no MIM#.

Phenotype   Gene Symbols    MIM Number  Cyto Location
?Thrombophilia 9 due to decreased release of tissue plasminogen (1) THPH9   612348  8p12
Thyroid adenoma, hyperfunctioning, somatic (3)  TSHR, CHNG1 603372  14q31.1
Thyroid carcinoma with thyrotoxicosis, somatic (3)  TSHR, CHNG1 603372  14q31.1
Thyroid carcinoma, nonmedullary, with cell oxyphilia (2)    TCO 603386  19p13.2
Thyroid carcinoma, papillary, with papillary renal neoplasia (2)    PTCPRN, PRN1    605642  1q21
Thyroid carcinoma, follicular, somatic, 188470 (3)  HRAS    190020  11p15.5
Thyroid carcinoma, follicular, somatic, 188470 (3)  NRAS, ALPS4, NS6, CMNS, NCMS    164790  1p13.2

Related

joeflack4 commented 2 years ago

@sabrinatoro Here's the full report of all such cases: noMimNumsInPhenoLabels.tsv.zip

I think for this issue, I have a solution to work on (b in 'possible solutions'). For your role, I think it is just analyzing this list and reporting back at the meeting?

And here are some example rows:

Phenotype   Gene Symbols    MIM Number  Cyto Location
3p- syndrome (4)    "DEL3pterp25, C3DELpterp25" 613792  3pter-p25
46XX sex reversal 2 (4) "SRXX2, DUP17q24.3" 278850  17q24.3-q25.1
"?Amelogenesis imperfecta, type IE, X-linked 2 (2)" "AI1E2, AIH3"   301201  Xq22-q28
"?Antiphospholipid syndrome, familial (2)"  ATPLS   107320  6p21.3
?Craniofacioskeletal syndrome (2)   CFSS    300712  Xq26-q27
joeflack4 commented 1 year ago

I did an investigation for #76 to look at how Exomiser does. It looks like if MIM number is missing, they don't add a relationship: https://github.com/monarch-initiative/omim/issues/76#issuecomment-1319284653

sabrinatoro commented 1 year ago

Here's the full report of all such cases: noMimNumsInPhenoLabels.tsv.zip

I am a bit confused: all the phenotypes in the list above have a OMIM id in a separate column. Could you show other line of the morbit map? maybe we are missing omim number for the gene (and not the phenotype)?

joeflack4 commented 1 year ago

Just updating this issue given that Sabrina already figured this out. I explained in related thread: https://github.com/monarch-initiative/omim/issues/76#issuecomment-1320616745