samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
663 stars 240 forks source link

Incorrect interpretation of fixed ALT homozygotes from bcftools filter? #2123

Open pmoizerv opened 7 months ago

pmoizerv commented 7 months ago

Hi all,

Not completely sure if this is an issue, if it is the expected function for the two bcftools commands (filter and view) or if I am misunderstanding something.

I want to get the number of sites that are fixed ALT homozygotes and to do that I used bcftools filter -i 'AF=1' my_species.vcf | grep -v "^#" | wc -l which returns 0.

If I use bcftools view first to "subsample" using a list of all my samples as such: bcftools query -l my_species.vcf > sample_list.txt ; bcftools view -S sample_list.txt my_species.vcf | bcftools filter -i 'AF=1' - | grep -v "^#" | wc -l it returns 8, while essentially I am running the same command.

Why does that happen? This number seems to be right, in the sense that the samples are all homozygous for ALT (1/1) but I am not sure if there are other sites missing from the final number. In the vcf I have kept biallelic snps and monomorphic sites. Some genotypes are missing but if I filter for missingness (say F_MISSINGNESS < 0.6) the result remains the same. Do you have any other suggestion on how to do this?

Thanks for your help!

pd3 commented 6 months ago

This could happen when existing INFO/AC,AN counts are different from the ones calculated from the genotypes FORMAT/GT. The command bcftools view -S recalculates the values.

If this is not the case, we'd need a small test case to reproduce the problem and investigate further.