samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
671 stars 240 forks source link

`bcftools +setGT` does not consider major allele as new genotype #1800

Closed twaddlac closed 2 years ago

twaddlac commented 2 years ago

I am trying to manually set a genotype based on Variant Allele Frequency (VAF) cutoffs.

Given the following VCF file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample1.sorted.bam
sample1       1       .       G       .       .       PASS    END=1566;MinDP=157      GT:DP   0/0:157
sample1       1567    .       C       A       13.9188 PASS    DP=363;VDB=0.885403;SGB=-0.693147;RPBZ=1.34407;MQBZ=0;MQSBZ=0;BQBZ=-0.875012;NMBZ=-0.631166;SCBZ=-0.445653;FS=0;MQ0F=0;AC=1;AN=2;DP4=154,133,25,32;MQ=60        GT:PL:DP:AD:VAF 0/0:49,0,255:344:287,57:0.165698
sample1       1568    .       C       .       .       PASS    END=1629;MinDP=320      GT:DP   0/0:320
sample1       1630    .       C       A       220.56  PASS    DP=496;VDB=0.0744208;SGB=-0.693147;RPBZ=-2.89068;MQBZ=0;MQSBZ=0;BQBZ=0.765252;NMBZ=0.891827;SCBZ=0.513809;FS=0;MQ0F=0;AC=1;AN=2;DP4=20,79,138,237;MQ=60 GT:PL:DP:AD:VAF 0/0:255,0,37:474:99,375:0.791139
sample1       1631    .       A       .       .       PASS    END=6536;MinDP=44       GT:DP   0/0:44

when I run: bcftools +setGT sample1.flt.norm.vcf.gz -- --target-gt q --new-gt M --include "VAF>0.6"

I would expect to see the genotype at position 1630 to be 1/1 instead of 0/0 since REF=C, ALT=A, and VAF=0.791139.

Does this make sense or is there something I'm missing?

Thank you!

pd3 commented 2 years ago

This is because the -n M refer to something else. Here the major allele means the allele observed more frequently in the population, not among the reads in that specific sample. I updated the usage page to make this clear. Also added a new --new-gt X option which allows to do what you want, filling in the allele with the bigger read depth as determined from FORMAT/AD.

JosephLalli commented 10 months ago

Hi @pd3,

+setGT does not allow the use of X in custom genotypes, eg --new-gt c:'0/M' is acceptable, but c:'0/X' causes the following error:

Could not parse the genotype: c:0/X

Could this capability be added?

pd3 commented 10 months ago

@JosephLalli This is now available, see https://github.com/samtools/bcftools/issues/2065