samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
649 stars 240 forks source link

filter for format field using ALT column #2033

Open arpanda opened 10 months ago

arpanda commented 10 months ago

I have a VCF file with the following FORMAT fields: AU, TU, GU, and CU. I want to calculate the Allele Frequency (AF) for each variant based on the ALT allele and then filter variants where the AF exceeds a threshold of 0.1. The formula for AF is as follows:

AF = \frac{[ALT]U}{(AU + TU + GU + CU)}

I am having difficulty obtaining the ALT count from the ALT field. Currently, I'm achieving this by using the following bcftools view expression:

bcftools view -i '( ALT=="A" & (AU/(AU + TU + GU + TU) > 0.1) ) | ( ALT=="T" & (TU/(AU + TU + GU + TU) > 0.1) ) ' sample_vcf

The expression above is becoming quite lengthy. I'm wondering if there's an alternative approach to achieve the same result.

Thank you, -Arijit

pd3 commented 10 months ago

It seems you want the VAF annotation, variant allele frequency

FORMAT/VAF     Number:A  Type:Float
        ....    The fraction of reads with the alternate allele, requires FORMAT/AD or ADF+ADR

If the VCF was annotated with the standard reserved VCF tag FORMAT/AD, you'd be able to do this using the +fill-tags plugin and then filter based on that

bcftools +fill-tags in.vcf -- -t FORMAT/VAF

I haven't encountered AU, TU, etc fields myself and don't know what programs are adding them. They look fairly impractical for this use case, maybe a change could be made there?