samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
634 stars 241 forks source link

extreme low coverage data issue #1168

Open nreid opened 4 years ago

nreid commented 4 years ago

Hello, I'm working with a mixture of low (5-10x) and very low (0.5x) coverage WGS sequences. I've got about 900 of them from a diverse set of populations, and variants are discovered every 10-20bp. I noticed that bcftools (1.9) is calling many heterozygous genotypes with no evidence for both alleles in low coverage samples:

format: GT:PL:DP:AD. genotype: 0/1:0,3,37:1:1,0

In this case there's a single reference read, but the genotype is called as heterozygous, even though the most likely genotype according to the PL is 0/0. I'm guessing this is due to some kind of prior on allele frequencies, but I don't see any options to turn that off.

I'm doing some basic population genetics, and I think this is really mucking things up. Is there any way to deal with this? I think using maximum likelihood genotypes without any kind of population prior would probably work far better in my situation.

I have used freebayes for analyses of this kind of data in the past and there HWE priors can be turned off, but this time around I'm working with ~900 samples with tons of genetic diversity and it has slowed to a crawl. bcftools runs quickly though, so I was hoping it would be suitable.

My commands are:

bcftools mpileup \
    -f $REFERENCE \
    -b $BAMLIST \
    -q 20 -Q 30 \
    --max-depth 100 \
    -r $SCAF \
    -a "DP,AD" | \
bcftools call -m -v -Oz -o $OUTDIR/${SCAF}.vcf.gz

Thanks,

pd3 commented 4 years ago

The program assumes that all samples come from the same population. To enforce a population structure (or, in the extreme case, to treat all samples individually), check the bcftools call --group-samples option. Note this requires that the mpileup step is run with bcftools mpileup -a FMT/AD option.