samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
673 stars 240 forks source link

query regarding calculating a new annotation #1422

Closed prasundutta87 closed 3 years ago

prasundutta87 commented 3 years ago

Hi,

Can I please be helped with a way on how to solve this query? I have to add an additional tag for variant allele fraction (VAF) in my VCF INFO area or (probably FORMAT also) in my single sample VCF file.

In my FORMAT/AD, I have two values, for ex., 43,45 where the numbers represent Allelic depths for the ref and alt alleles for a sample in the order listed.

How can I utilise the two values and calculate VAF using the formula alt/ref+alt and also add it to my VCF INFO or FORMAT area?

I have tried bcftools query -f '[%AD\n]' which gives 43,45. But, I don't know how to separate them in bcftools and use it to do the VAF calculation and add it in the VCF file.

Regards, Prasun

I am using bcftools v1.11.

pd3 commented 3 years ago

Hi, this is not possible, but I just added this functionality into the fill-tags plugin. For example:

bcftools +fill-tags test/fill-tags-VAF.vcf -- -t VAF
prasundutta87 commented 3 years ago

Thanks @pd3 ..do I need to update bcftools for this to work?

pd3 commented 3 years ago

Yes, you'll need the latest version from github, please see here http://samtools.github.io/bcftools/howtos/install.html

prasundutta87 commented 3 years ago

Okay.. I actually installed bcftools using bioconda..and I am using bcftools v1.11. Is there any way if I only want to update the plugins using bioconda? I actually have a lot of permission issues as I am working on a cluster, so installing/updating anything can be an issue for me.

pd3 commented 3 years ago

You could take just the plugin source code and compile. Or wait for the next numbered version, a new release is imminent.

prasundutta87 commented 3 years ago

Thanks a lot for this @pd3! :-)

ghost commented 3 years ago

So this works but why if you are already creating the total AD in the denominator for this calculation, there is not total AD tag to fill?

pd3 commented 3 years ago

I don't understand the question. What do you mean by 'total AD'? Perhaps the total depth? If yes, then you would want to add a DP tag. As why it is not there already, well, because someone would have to take the time and do it, and time is a valuable commodity.

BJWiley233 commented 3 years ago

The denominator for calculating the VAF is the sum of the comma separated FMT/AD tag. You already calculate adding these together when filling the VAF tag. So it would be nice to have either an INFO/DP tag or a FMT/DP tag to fill with the sum of this comma separated AD tag (the denominator for the VAF). Pindel only gives the AD tag in the format field, i.e. 45,12. But they don’t add this up for a total DP tag. They have absolutely no DP tag whatsoever. This would be nice to have especially because you already are doing it in the denominator of the VAF.

BJWiley233 commented 3 years ago

I can post an example of Pindel output when I get to office.

pd3 commented 3 years ago

This is possible via the command

bcftools +fill-tags input.vcf -- -t 'DP=sum(AD)'
BJWiley233 commented 3 years ago

For me this only gives the reference depth total.

input: GT:AD 0/0:22,2

output goes to INFO/DP: DP=22 GT:AD 0/0:22,2

correct output should be: DP=24 GT:AD 0/0:22,2

Here is an example input where the DP should equal 112(111+1) and 87(83+84): It should be something like bcftools filter where you can do AD[sample:subfield] such as DP=sum(AD[0:0], AD[0:1]) or however you are calulating the total denominator for VAF.

##fileDate=20161216
##source=pindel
##reference=GRCh38DH
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=HOMLEN,Number=1,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">
##INFO=<ID=PF,Number=1,Type=Integer,Description="The number of samples carry the variant">
##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical micro-homology at event breakpoints">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=NTLEN,Number=.,Type=Integer,Description="Number of bases inserted in place of deleted code">
##FORMAT=<ID=PL,Number=3,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Reference depth, how many reads support the reference">
##FORMAT=<ID=AD,Number=2,Type=Integer,Description="Allele depth, how many reads support this allele">
##bcftools_viewVersion=1.12-60-g7100c81+htslib-1.12-50-g0313654
##bcftools_viewCommand=view 1076180_23153_0_0/1076180_23153_0_0.vcf.gz; Date=Mon Jun 14 12:59:19 2021
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  UKB_1143822_0230759086
chr1    69246   .       GTACTTCCTGCTAGCCAACCTCTCACTCATTGATCTGTCTCTGTCTTCAGTCACAGCCCCCAAGATGAT   G       .       PASS    END=69314;HOMLEN=4;HOMSEQ=TACT;SVLEN=-68;SVTYPE=DEL     GT:AD   0/0:111,1
chr1    451649  .       A       AG      .       PASS    END=451649;HOMLEN=1;HOMSEQ=G;SVLEN=1;SVTYPE=INS GT:AD   0/0:83,4
pd3 commented 3 years ago

Ah, you are right. This has now been fixed and extended to allow custom number of fields to output (fixed number or variable number of fields) and the output type (float or integer). For example, try:

# generates Number=.,Type=Float
bcftools +fill-tags input.vcf -- -t 'DP=sum(FORMAT/AD)'

# generates Numbe=1,Type=Integer
bcftools +fill-tags input.vcf -- -t 'DP:1=int(sum(FORMAT/AD))'

I hope this helps.

ghost commented 3 years ago

Cool thanks for this. Looks like you updated a lot so I appreciate it.

On Wed, Jun 16, 2021 at 4:18 AM Petr Danecek @.***> wrote:

Ah, you are right. This has now been fixed and extended to allow custom number of fields to output (fixed number or variable number of fields) and the output type (float or integer). For example, try:

generates Number=.,Type=Float

bcftools +fill-tags input.vcf -- -t 'DP=sum(FORMAT/AD)'

generates Numbe=1,Type=Integer

bcftools +fill-tags input.vcf -- -t 'DP:1=int(sum(FORMAT/AD))'

I hope this helps.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/samtools/bcftools/issues/1422#issuecomment-862197172, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASU5KNNOHY673Z3OJMJ5P5LTTBT57ANCNFSM4X4IFDDQ .

roryk commented 3 years ago

Thanks Petr, this functionality was just super useful to me.