Open ad2n15 opened 5 years ago
Unfortunately, such feature is not available in bcftools. You could do the following though.
The binom()
function only calculates the binomial probability and assumes the mean 0.5. If you want to correct for multiple testing in order to meet the desired FDR rate, you could modify the expected mean here https://github.com/samtools/bcftools/blob/develop/convert.c#L1176 and then use something like
bcftools query -i'GT="het"' -f'[%CHROM:%POS %SAMPLE %GT %pbinom(AD)\n]' file.vcf
to get the list of probabilities for each tested heterozygous genotype. Then it is possible to determine the Benjamini-Hochberg correction for filtering. The expected mean would need to be adjusted here https://github.com/samtools/bcftools/blob/develop/filter.c#L1360.
A possible improvement to bcftools would be to allow passing the expected mean as a parameter to binom()
so that you wouldn't have to hack the code.
I am filtering a multi VCF file of > 1000 sapmples, to identify somatic variants. So, I want to include variants which achieve this criteria:
1) SNPs 2) Observed once or twice (minor allele frequency less than 0.01%) in the cohort 3) Allelic fraction above 10% 4) Failed the hypothesis that the alternate allelic count was distributed as a binomial process with mean 45% with a designed false positive rate of 10-5
The last point is tricky for me.
Kind Regards