monarch-initiative / omim

Data ingest pipeline for OMIM.
7 stars 3 forks source link

Reconcile inconsistent HGNC mappings #49

Closed joeflack4 closed 2 years ago

joeflack4 commented 2 years ago

In running the new ingest, the following HGNC warnings were produced:

Warning: MIM# 618682 was mapped to two different HGNC symbols, CFAP276 and C1orf194. This was unexpected, so this mapping has been removed.
Warning: MIM# 601502 was mapped to two different HGNC symbols, FCGR1BP and FCGR1B. This was unexpected, so this mapping has been removed.
Warning: MIM# 617636 was mapped to two different HGNC symbols, CMKLR2-AS and GPR1-AS. This was unexpected, so this mapping has been removed.
Warning: MIM# 615252 was mapped to two different HGNC symbols, ZBED10P and ZBED6CL. This was unexpected, so this mapping has been removed.
Warning: MIM# 610242 was mapped to two different HGNC symbols, RNF32-DT and LINC01006. This was unexpected, so this mapping has been removed.
Warning: MIM# 613044 was mapped to two different HGNC symbols, FAM90A7 and FAM90A7P. This was unexpected, so this mapping has been removed.
Warning: MIM# 613050 was mapped to two different HGNC symbols, FAM90A14 and FAM90A14P. This was unexpected, so this mapping has been removed.
Warning: MIM# 613052 was mapped to two different HGNC symbols, FAM90A18 and FAM90A18P. This was unexpected, so this mapping has been removed.
Warning: MIM# 613045 was mapped to two different HGNC symbols, FAM90A8 and FAM90A8P. This was unexpected, so this mapping has been removed.
Warning: MIM# 613053 was mapped to two different HGNC symbols, FAM90A19 and FAM90A19P. This was unexpected, so this mapping has been removed.
Warning: MIM# 613046 was mapped to two different HGNC symbols, FAM90A9 and FAM90A9P. This was unexpected, so this mapping has been removed.
Warning: MIM# 613047 was mapped to two different HGNC symbols, FAM90A10 and FAM90A10P. This was unexpected, so this mapping has been removed.
Warning: MIM# 614502 was mapped to two different HGNC symbols, PIERCE1 and C9orf116. This was unexpected, so this mapping has been removed.
Warning: MIM# 617378 was mapped to two different HGNC symbols, MYL11 and MYLPF. This was unexpected, so this mapping has been removed.
Warning: MIM# 618318 was mapped to two different HGNC symbols, CFAP119 and CCDC189. This was unexpected, so this mapping has been removed.
Warning: MIM# 604596 was mapped to two different HGNC symbols, FBXW10B and CDRT1. This was unexpected, so this mapping has been removed.
Warning: MIM# 609517 was mapped to two different HGNC symbols, MYO18A and TIAF1. This was unexpected, so this mapping has been removed.
Warning: MIM# 176705 was mapped to two different HGNC symbols, PHB1 and PHB. This was unexpected, so this mapping has been removed.
Warning: MIM# 137181 was mapped to two different HGNC symbols, GGT2P and GGT2. This was unexpected, so this mapping has been removed.

I've reached out to the OMIM people for guidance on what to do about this, whether both of these mappings are valid, only one of them, or possibly neither.

inconsistent hgnc symbols.csv

matentzn commented 2 years ago

This is exactly this issue: #45

joeflack4 commented 2 years ago

Hmm, maybe so. I re-read the issue a couple times, but I don't understand some of what you and Chris are talking about.

I'm not 100% sure if these are the same issue, or related. One reason is because the there is a difference between the problematic OMIM terms mentioned in this issue, and the ones in the issue that you linked. I would imagine that if it was the same issue, then the list of terms would be the same terms.

Also, I just updated the original post. I had accidentally written, for example: Warning: HGNC symbol 613050.0 was mapped to two different HGNC IDs, FAM90A14 and FAM90A14P. This was unexpected, so this mapping has been removed.

But what I meant to write was: Warning: MIM# 613050 was mapped to two different HGNC symbols, FAM90A14 and FAM90A14P. This was unexpected, so this mapping has been removed.

joeflack4 commented 2 years ago

This was apparently due to mim2gene.txt and genemap2.txt being from different dates. Once I had versions of these that were produced on the same date, I got no inconsistencies. I've updated the code to download a fresh copy of genemap2.txt, which it wasn't doing before, so the problem is now solved.