obophenotype / human-phenotype-ontology

Ontology for the description of human clinical features
http://obophenotype.github.io/human-phenotype-ontology/
Other
294 stars 51 forks source link

Invalid gene names in artefacts #4916

Closed marc-sturm closed 4 years ago

marc-sturm commented 5 years ago

Hi,

We noticed that some of the gene names listed in the artefacts are outdated.

Some are just previous symbols, which is no big problem (the gene name listed in HPO is the second gene name, the first gene names is the HGNC-approved name):

AARS1 REPLACED: AARS is a previous symbol ADSS1 REPLACED: ADSSL1 is a previous symbol ARSL REPLACED: ARSE is a previous symbol MT-ATP6 REPLACED: ATP6 is a synonymous symbol MT-ATP8 REPLACED: ATP8 is a synonymous symbol CERT1 REPLACED: COL4A3BP is a previous symbol MT-CO3 REPLACED: COX3 is a synonymous symbol MT-CYB REPLACED: CYTB is a synonymous symbol DARS1 REPLACED: DARS is a previous symbol GARS1 REPLACED: GARS is a previous symbol HARS1 REPLACED: HARS is a previous symbol H1-4 REPLACED: HIST1H1E is a previous symbol IARS1 REPLACED: IARS is a previous symbol KARS1 REPLACED: KARS is a previous symbol KIFBP REPLACED: KIF1BP is a previous symbol LARS1 REPLACED: LARS is a previous symbol MARS1 REPLACED: MARS is a previous symbol MT-ND2 REPLACED: ND2 is a synonymous symbol MT-ND3 REPLACED: ND3 is a synonymous symbol MT-ND4 REPLACED: ND4 is a synonymous symbol MT-ND4L REPLACED: ND4L is a synonymous symbol MT-ND5 REPLACED: ND5 is a synonymous symbol MT-ND6 REPLACED: ND6 is a synonymous symbol RARS1 REPLACED: RARS is a previous symbol SARS1 REPLACED: SARS is a previous symbol MT-TC REPLACED: TRNC is a synonymous symbol MT-TE REPLACED: TRNE is a synonymous symbol MT-TF REPLACED: TRNF is a synonymous symbol MT-TH REPLACED: TRNH is a synonymous symbol MT-TI REPLACED: TRNI is a synonymous symbol MT-TK REPLACED: TRNK is a synonymous symbol MT-TL1 REPLACED: TRNL1 is a synonymous symbol MT-TL2 REPLACED: TRNL2 is a synonymous symbol MT-TN REPLACED: TRNN is a synonymous symbol MT-TQ REPLACED: TRNQ is a synonymous symbol MT-TS1 REPLACED: TRNS1 is a synonymous symbol MT-TS2 REPLACED: TRNS2 is a synonymous symbol MT-TT REPLACED: TRNT is a synonymous symbol MT-TV REPLACED: TRNV is a synonymous symbol MT-TW REPLACED: TRNW is a synonymous symbol VARS1 REPLACED: VARS is a previous symbol YARS1 REPLACED: YARS is a previous symbol

However, some gene names cannot be converted to HGNC-approved names, which makes the information hard to use:

COX1 ERROR: COX1 is a synonymous symbol of the genes MT-CO1, PTGS1 COX2 ERROR: COX2 is a synonymous symbol of the genes MT-CO2, PTGS2 H19-ICR ERROR: H19-ICR is unknown symbol HBB-LCR ERROR: HBB-LCR is unknown symbol ND1 ERROR: ND1 is a synonymous symbol of the genes IVNS1ABP, MT-ND1 QARS ERROR: QARS is a previous symbol of the genes EPRS, QARS1 TRNP ERROR: TRNP is a synonymous symbol of the genes MT-TP, TRNP1

The output shown here was created using this command:

wget -O - http://compbio.charite.de/jenkins/job/hpo.annotations.monthly/lastStableBuild/artifact/annotation/ALL_SOURCES_ALL_FREQUENCIES_diseases_to_genes_to_phenotypes.txt | cut -f2 | sort | uniq | GenesToApproved | grep -v KEPT

GenesToApproved is part of ngs-bits

Best, Marc

pnrobinson commented 4 years ago

Sorry for the delay in responding. Thanks for pointing this out. Our build system uses external sources from the NCBI (medgen) to asociate genes to diseases. We have also occasionally noted errors and report them to NCBI medgen, who usually fix promptly. I have checked some of these errors as listed above and some but not all seem to be resolved now. Could I ask you to report errors here? https://www.ncbi.nlm.nih.gov/medgen/docs/help/

marc-sturm commented 4 years ago

Hi,

the problem still exists: COX1 ERROR: COX1 is a synonymous symbol of the genes MT-CO1, PTGS1 COX2 ERROR: COX2 is a synonymous symbol of the genes MT-CO2, PTGS2 H19-ICR ERROR: H19-ICR is unknown symbol HBB-LCR ERROR: HBB-LCR is unknown symbol ND1 ERROR: ND1 is a synonymous symbol of the genes IVNS1ABP, MT-ND1 TRNP ERROR: TRNP is a synonymous symbol of the genes MT-TP, TRNP1 WHCR ERROR: WHCR is unknown symbol

I'm really not sure what to report to NCBI medgen, since I don't use it myself and cannot even give them an example of a API query that gives invalid results. I think you should report back to them. Also because it would have more weight.

Best, Marc

pnrobinson commented 4 years ago

@iimpulse thanks, Marc, we will try to track this down and we will report this to NCBI as appropriate.

pnrobinson commented 4 years ago

I have reported this to medgen. This is all we can do at the moment. Assuming they correct their files, our downstream files will reflect the information in the next release.