samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
649 stars 240 forks source link

homozygous with equal allele depth #2056

Open abhortas opened 8 months ago

abhortas commented 8 months ago

I am genotyping 10 individuals with mpileup and call commands and I am getting some strange results.

I am using this code:

bcftools mpileup --skip-indels --annotate FORMAT/AD,FORMAT/DP,INFO/AD -C 0 -d 550 -Ou -f $ref $name | bcftools call -mv -Ov -o calls.vcf

variable $name has the 10 bam files of each individual and $ref the reference genome.

Here is one of the lines of the final vcf:

FORMAT=

FORMAT=

FORMAT=

FORMAT=

........GT:PL:DP:AD 1/1:31,6,0:26:13,13 1/1:0,7,1:26:14,12 1/1:8,9,0:38:17,21 1/1:0,16,1:44:28,16 1/1:28,6,0:24:14,10 1/1:3,5,0:22:11,11 1/1:9,0,10:17:11,6 1/1:2,0,6:20:10,10 1/1:10,12,0:57:33,24 1/1:6,0,10:24:9,15

so, it looks so strange for me that the pipeline calls all the 10 genotypes 1/1 when the allele depths are similar in every case.

If I just look at the allele depth I would say all the genotypes should be 0/1

What am I missing?

And yes, I can see the Shred-scaled genotype likelihoods, and several case looks to indicate homozygous 1/1 (here for example 1/1:31,6,0:26:13,13), but in other cases, wouldn't be heterozygous? (here for example 1/1:28,6,0:24:14,10).

And another final question. If I am getting all individuals 1/1, in other words.. a monomorphic site.....why is this site present in the final vcf? I would spect at least one allele of ALT (or REF).

pd3 commented 8 months ago

Starting with the last question, homozygous alternate genotypes count as variants relative to the reference genome. Regarding the genotypes, it depends on the actual reads and base qualities, and BAQ realignment - try to switch it off. The results look dodgy, a small test case would be required to understand and explain what is going on

abhortas commented 8 months ago

Thanks for the quick response.

Ok, I understand now why I have some monomorphic sites, I hadn't thought it but it makes sense.

On the other hand, I've already tried running the pipeline with BAQ realignment off, and the results are practically identical. All the genotypes are 1/1 and just change few allele depths, but nothing important, just one or two values.

it continue looks strange genotyping but I guess it is due to bases qualities and the internal algorithm of the pipeline.