Bcftools query - extract specific part from INFO field

samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html

Other

680 stars 240 forks source link

Hello, I have VCF file with merged allele counts and frequencies from different subsets (gnomad V.3.1 genomes). In the INFO field, there are AC, AN, and AF values given for all set; and then, AC, AN and AF values for each subset (i.e. non TOPMed) and each subpopulation (ie. SAS). such like AC=2;AN=76882;AF=2.60139e-05; AC-non_v2-XX=1;AN-non_v2-XX=32272;AF-non_v2-XX=3.09866e-05 I would like to extract first columns, and also AC and AN fields, related with specific subpopulation (in this case: AC-non_v2-XX,not AC=2); Values, that interest me, are 113. and 114. in INFO row, respectively I tried this syntax bcftools query -f '%CHROM %POS %ID %REF %ALT %AC{113} %AN{114}\n' chr1_econtrol_bcftools_filter.txt

However, I only get these AC and AN values that are first in INFO field row; the script does not look than the first AC and AN symbols

How can I modify RegEx in this case?

I don't understand entirely. First, the minus sign should not be part of VCF tag names. I know there are VCFs out there that break this convention, unfortunately bcftools don't support it (https://github.com/samtools/bcftools/issues/1387). One can, however, use bcftools annotate --rename-annots to rename such annotations.

Then you should be able to access the fields you are interested in as e.g.

bcftools query -f ' %AC_non_v2_XX \n'

The notation curly brackets notation (which is somewhat unfortunate as square brackets were already taken) is to take a specific value from a single tag. For example, if there was a tag like this TAG=1,2,3,4, one can write

$ bcftools query -f '%TAG{1} \n'
2

to print the second value.

This is documented here http://samtools.github.io/bcftools/howtos/query.html. If anything is unclear, please do let us know.

samtools / bcftools

Bcftools query - extract specific part from INFO field #1426