qichao1984 / MCycDB

7 stars 8 forks source link

Repeated IDs in the id2gene.map file with mistakenly annotation #2

Open zhaoze2020 opened 1 year ago

zhaoze2020 commented 1 year ago

Hello, I would like to report an issue in id2gene.map file. After checking the id2gene.map file, I found there are 36 repeated ids in the first column. All of them were duplicated twice with different gene names. Most are mistakenly annotated as both dfrA1 and dfrA12. And the gene.name annotation of these repeated lines from KEGG database are completely wrong.

If you can deal with this issue?

Best regards, Ze Zhao Westlake University, Hangzhou, Zhejiang, China

the wrong annotation is list below: 1121105.ATXL01000017_gene1583 dfrA12 eggNOG 1121105.ATXL01000017_gene1583 dfrA1 eggNOG 1226325.HMPREF1548_01821 dfrA12 eggNOG 1226325.HMPREF1548_01821 dfrA1 eggNOG 1500897.JQNA01000002_gene2735 dfrA12 eggNOG 1500897.JQNA01000002_gene2735 dfrA1 eggNOG 261856243 dfrA12 COG 261856243 dfrA1 COG 91778189 dfrA12 COG 91778189 dfrA1 COG A0A068JG69 dfrA12 UniProt A0A068JG69 dfrA1 UniProt A0A080LV06 fbp3 UniProt A0A080LV06 fbp UniProt A0A0A0YXF5 dfrA12 UniProt A0A0A0YXF5 dfrA1 UniProt A0A0A0YZX3 dfrA12 UniProt A0A0A0YZX3 dfrA1 UniProt A0A0S1VX22 dfrA12 UniProt A0A0S1VX22 dfrA1 UniProt A0A0U3B2C7 dfrA12 UniProt A0A0U3B2C7 dfrA1 UniProt A0A0Y0M9Q6 dfrA12 UniProt A0A0Y0M9Q6 dfrA1 UniProt A0A1V5ZIR6 rnfA UniProt A0A1V5ZIR6 rnfE UniProt A0A2L1KBR3 dfrA12 UniProt A0A2L1KBR3 dfrA1 UniProt A0A2X3KIN4 fbp3 UniProt A0A2X3KIN4 fbp UniProt A0A383M7T7 fbp3 UniProt A0A383M7T7 fbp UniProt A0A3R2SYD9 fbp3 UniProt A0A3R2SYD9 fbp UniProt A0A3R4GJ38 fbp3 UniProt A0A3R4GJ38 fbp UniProt A0A3R5KPM5 fbp3 UniProt A0A3R5KPM5 fbp UniProt A0A3S5DLK1 fbp3 UniProt A0A3S5DLK1 fbp UniProt A0A3V9LYA9 dfrA12 UniProt A0A3V9LYA9 dfrA1 UniProt A0A4U9CEH5 fbp3 UniProt A0A4U9CEH5 fbp UniProt A0A5E6M606 fbp-SEBP UniProt A0A5E6M606 fbp UniProt A0A5E6MPE7 fbp-SEBP UniProt A0A5E6MPE7 fbp UniProt ag:AAA92749 dfrA10 KEGG ag:AAA92749 dfrA1 KEGG ag:AAS66087 dfrA12 KEGG ag:AAS66087 dfrA1 KEGG ag:CAA90683 dfrA12 KEGG ag:CAA90683 dfrA1 KEGG ag:CAC81324 dfrA19 KEGG ag:CAC81324 dfrA1 KEGG ag:CAF31623 dfrA12 KEGG ag:CAF31623 dfrA1 KEGG ag:CAX16467 dfrA12 KEGG ag:CAX16467 dfrA1 KEGG F8LZ35 dfrA12 UniProt F8LZ35 dfrA1 UniProt kin:AB182_03125 dfrA19 KEGG kin:AB182_03125 dfrA1 KEGG kpm:KPHS_p300720 dfrA12 KEGG kpm:KPHS_p300720 dfrA1 KEGG sem:STMDT12_L00660 dfrA12 KEGG sem:STMDT12_L00660 dfrA1 KEGG W7WJ21 fbp3 UniProt W7WJ21 fbp UniProt W8FZ28 dfrA12 UniProt W8FZ28 dfrA1 UniProt

quliping commented 5 months ago

I also met the same problem as you, and I found that this problem is not the only one. I found some genes in the pathway have no seqences ids in the id2gene.map file. e.g. gene 'fwdD': image It means that all of your query sequences could not be annotated as these genes. Although the sequences of these genes are present in the id2gene.map file and you can get the target sequence id for your query sequences, but we could not kow the gene name of these annotation results. It is a very bad problem but seems that the . This database should not be used at all.