oncokb / oncokb-annotator

Annotates variants in MAF with OncoKB annotation.
GNU Affero General Public License v3.0
121 stars 59 forks source link

MafAnnotator incorrectly labels genes based on alternate gene symbols #125

Closed sprokopec closed 3 years ago

sprokopec commented 3 years ago

MafAnnotator seems to classify genes as 'GENE_IN_ONCOKB = True' based on alternate Hugo Symbol. For example, variants in PRMT9 (entrez id = 90826, chromosome 4) are being labelled True, despite not being present in OncoKB. This is not limited to MafAnnotator; for example, this query from Swagger returns FBXO11 (aka PRMT9, entrez id = 80204, chromosome 2): https://www.oncokb.org/api/v1/annotate/mutations/byGenomicChange?genomicLocation=4%2C147661042%2C147661042%2CG%2CT&referenceGenome=GRCh38

Similarly, querying RAD1 (entrez id = 5810, chromosome 5) points to ERCC4 (aka RAD1, entrez id = 2072, chromosome 16): https://www.oncokb.org/api/v1/annotate/mutations/byGenomicChange?genomicLocation=5%2C34908772%2C34908772%2CT%2CC&referenceGenome=GRCh38

I am using the latest commit on master.

zhx828 commented 3 years ago

@sprokopec I will look into. Sorry about the late reply.

zhx828 commented 3 years ago

@sprokopec We have been using gene info from cBioPortal and myGene.info. When a gene alias is matched, the corresponding gene will be used. We are in an effort to clean up the gene info to avoid any existence of duplicate alias. I will keep you posed once that's done.

zhx828 commented 3 years ago

@sprokopec the issue should be addressed now. We know use gene list from HGNC which gene alias will always match with one hugo symbol. Let me know if you see any issues.