Open pd3 opened 1 month ago
Depending on the version used, recent versions of mpileup have to be run with -a INFO/FS
in order for the field to be populated. Run bcftools mpileup -a \?
to see which annotations are filled in by default and which need to be given explicitly.
However, what to do now, if recalling is not an option? I see three possibilities:
1) Extract the necessary information using bcftools query
, compute the test using a third-party software, then reannotate with bcftools annotate
.
2) Use the bcftools +ad-bias
plugin, it can filter by Fisher test on FMT/AD
3) Perhaps best would be to extend the plugin +fill-tags
, currently it supports the following tags, adding Fisher test would be trivial
$ bcftools +fill-tags -- -l
INFO/AC Number:A Type:Integer .. Allele count in genotypes
INFO/AC_Hom Number:A Type:Integer .. Allele counts in homozygous genotypes
INFO/AC_Het Number:A Type:Integer .. Allele counts in heterozygous genotypes
INFO/AC_Hemi Number:A Type:Integer .. Allele counts in hemizygous genotypes
INFO/AF Number:A Type:Float .. Allele frequency from FMT/GT or AC,AN if FMT/GT is not present
INFO/AN Number:1 Type:Integer .. Total number of alleles in called genotypes
INFO/ExcHet Number:A Type:Float .. Test excess heterozygosity; 1=good, 0=bad
INFO/END Number:1 Type:Integer .. End position of the variant
INFO/F_MISSING Number:1 Type:Float .. Fraction of missing genotypes (all samples, experimental)
INFO/HWE Number:A Type:Float .. HWE test (PMID:15789306); 1=good, 0=bad
INFO/MAF Number:1 Type:Float .. Frequency of the second most common allele
INFO/NS Number:1 Type:Integer .. Number of samples with data
INFO/TYPE Number:. Type:String .. The record type (REF,SNP,MNP,INDEL,etc)
FORMAT/VAF Number:A Type:Float .. The fraction of reads with the alternate allele, requires FORMAT/AD or ADF+ADR
FORMAT/VAF1 Number:1 Type:Float .. The same as FORMAT/VAF but for all alternate alleles cumulatively
TAG:Number=Type(EXPR) .. Experimental support for user expressions such as DP:1=int(sum(DP))
If Number and Type are not given (e.g. DP=sum(DP)), variable number (Number=.) of floating point
values (Type=Float) will be used.
Hello, I'm revisiting this comment as I've run into a similar observation in my dataset- the FS field (fisher test for strand bias) does not seem to be correctly populated — it is coming up as zero for all genotypes so everything is passing. I've read through this thread and am a bit confused- is this something that needs to be explicitly specified (ie, accounting for this is not the default)?
We've just spent a long time calling reads for a very large dataset and would like to avoid doing it all over again. Is there a way to populate this field after the bcftools mpileup? We have retained the read depth for for/rev reads supporting alt/ref so is it possible to calculate this without recalling genotypes? Thank you so much!
Originally posted by @paigeduffin in https://github.com/samtools/bcftools/issues/1834#issuecomment-2354477466