konradjk / exac_browser

Browser for ExAC consortium data
http://exac.broadinstitute.org
MIT License
106 stars 54 forks source link

Downloadable files, Sites VCF, POPMAX values listed as strings #286

Closed martiliasf closed 7 years ago

martiliasf commented 8 years ago

##INFO=<ID=AC_POPMAX,Number=A,Type=String,Description="AC in the population with the max AF"> ##INFO=<ID=AN_POPMAX,Number=A,Type=String,Description="AN in the population with the max AF">

Indeed, easily fixable with a reheader to manually change Type to Integer , but then I'm a human editing a vcf file.

Is there a reason to store AC and AN as string that i'm not aware of?

martiliasf commented 7 years ago

Maybe figured it out. Multi-allelic sites kinda used as a compression mechanism, and strings allow this, with the expectation that the multiallelic VCF file will be deconvoluted/decomposed using such tools as vt as described here:

http://gemini.readthedocs.io/en/latest/ http://genome.sph.umich.edu/wiki/Vt#Installation

I'll go ahead and close this as its not an issue with ExAC browser but more a nuance of the VCF.