samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
649 stars 240 forks source link

bcftools mpileup: interpreting AD (Allelic depth) annotation #2031

Closed rmurray2 closed 10 months ago

rmurray2 commented 10 months ago

I'd like to get the nt count at each position from a bcf file (exactly like this question)

The posted solution is: bcftools mpileup --annotate FORMAT/AD. I tried using --annotate and passing the output to bcftools call, but I'm not entirely clear about the what the allelic depth field means in the resulting bcf file. Here are a couple of bcf file entries with only the relevant info shown:

example 1:

REF=G
ALT=C,A,<*>
DP=111
AD=109,1,1,0

Question 1: Does this mean that the nucleotide counts at this position are G:109; C:1; A:1; T:0 ? Question 2: What does <*> mean?

example 2:

REF=G
ALT=C,T,A
DP=122
AD=118,2,1,1

Question 3: Does this mean the nucleotide counts are G:118; C:2; T:1; A:1 ?

Thanks!

pd3 commented 10 months ago

That is correct. The symbolic star allele <*> is a placeholder for alleles not observed in the data which allows to represent genotype likelihoods of non-reference genotypes.

See also the VCF specification https://samtools.github.io/hts-specs/VCFv4.3.pdf