samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
680 stars 240 forks source link

bcftools call multi samples #1450

Closed zhoudreames closed 3 years ago

zhoudreames commented 3 years ago

there have different result when I use all two sample together or only use one of two samples to call variants. the one sample code is "bcftools mpileup -Ou sample1.bam -f ref.fa | bcftools call -mv -o sample1.vcf" the two sample code is "bcftools mpileup -Ou -b bam.flie -f ref.fa | bcftools call -mv -o sample1and2.vcf"

the different result show you in blow: sample1 result:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT fatherHap

chr12 1473 . A AT 10.7919 . ....... 1/1:40,3,0 sample1and2 result:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT fatherHap motherHap

chr12 1473 . A AT 7.82303 . ....... 0/1:40,3,0 0/1:0,3,38

in single sample model ,chr12 1473 location .the fatherHap genetype is 1/1,but when use two sample to run,the different result is 0/1 in fatherHap genetype,thats why?

zhoudreames commented 3 years ago

@athos @lindenb @arq5x @junaruga can you help me ? thanks ~

zhoudreames commented 3 years ago

@athos @arq5x @junaruga @kmsquire

pd3 commented 3 years ago

The caller makes the assumption of HWE and with more samples the estimated allele frequencies change. The difference between the alt hom and het genotype in your example is really small, so the context of the calling matters.

If this is not desired, the outcome can be influenced by providing the -G, --group-samples option to bcftools call. With -G - the caller should effectively simulate a single-sample calling. If you are going to use this option, please make sure to use the latest version 1.12, there were important improvements and fixes.

More about the math of the calling can be found here http://samtools.github.io/bcftools/call-m.pdf