macarthur-lab / clinvar

This repo provides tools to convert ClinVar data into a tab-delimited flat file, and also provides that resulting tab-delimited flat file.
Other
122 stars 55 forks source link

missing hgvs_c when indel is realigned and shifted. #57

Open sicotte opened 5 years ago

sicotte commented 5 years ago

I noticed a lot of indels/fs have missing hgvs_c NM transcript information. I do see the hgvs_p (with the protein),

Example: This variant https://www.ncbi.nlm.nih.gov/clinvar/variation/52203/

is listed as Chr13: 32341188 – 32341192 in clinvar

In the files from the clinvar/macarthur pipeline, It gets left shifted here: 13 32341183

There is no “hgvs_c” field in the output from the pipeline It does have have (which is the same as clinvar .. not sure if that was recomputed). "hgvs_p":"NP_000050.2:p.Ile2278Serfs",

However, we can find the hgvs_c information under the molecular consequence field.

"molecular_consequence":["NM_000059.3:c.6833_6837del:frameshift variant"]

I’m not sure if the pipeline computed that or it’s from clinvar (same as in clinvar).. But it could be propagated to the hgvs_c field.