macarthur-lab / clinvar

This repo provides tools to convert ClinVar data into a tab-delimited flat file, and also provides that resulting tab-delimited flat file.
Other
122 stars 55 forks source link

Wrong gene mapping #37

Closed ManavalanG closed 7 years ago

ManavalanG commented 7 years ago

ClinVar record 292157 is about gene MTHFR, but it is wrongly mapped to gene C1orf167 in file clinvar_alleles.single.b37.tsv.gz. Coordinate Chr1:11786191 maps to C1orf167 in + strand and MTHFR in - strand. Looks like somewhere in the code, strand variable is overlooked during gene mapping.

ManavalanG commented 7 years ago

After a bit more probing, this file has all ClinVar IDs with wrong gene mapping. Column 'symbol' was obtained from clinvar_alleles.single.b37.tsv.gz, and columns 'geneID_from_refseqID' and 'geneName_from_refseqID' were obtained using RefSeq ID associated with ClinVar IDs.

simnim commented 7 years ago

I just looked up the example you Highlighted. At least for this case it's the same issue I raised earlier. 31

The first gene symbol in the xml file is C1orf167 and the second is MTHFR.