Open pmoizerv opened 7 months ago
This could happen when existing INFO/AC,AN counts are different from the ones calculated from the genotypes FORMAT/GT. The command bcftools view -S
recalculates the values.
If this is not the case, we'd need a small test case to reproduce the problem and investigate further.
Hi all,
Not completely sure if this is an issue, if it is the expected function for the two bcftools commands (filter and view) or if I am misunderstanding something.
I want to get the number of sites that are fixed ALT homozygotes and to do that I used
bcftools filter -i 'AF=1' my_species.vcf | grep -v "^#" | wc -l
which returns 0.If I use bcftools view first to "subsample" using a list of all my samples as such:
bcftools query -l my_species.vcf > sample_list.txt ; bcftools view -S sample_list.txt my_species.vcf | bcftools filter -i 'AF=1' - | grep -v "^#" | wc -l
it returns 8, while essentially I am running the same command.Why does that happen? This number seems to be right, in the sense that the samples are all homozygous for ALT (1/1) but I am not sure if there are other sites missing from the final number. In the vcf I have kept biallelic snps and monomorphic sites. Some genotypes are missing but if I filter for missingness (say F_MISSINGNESS < 0.6) the result remains the same. Do you have any other suggestion on how to do this?
Thanks for your help!