samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
634 stars 241 forks source link

Could bcftools report all states of characters for very low coverage? #2141

Open Axze-rgb opened 3 months ago

Axze-rgb commented 3 months ago

Hello,

I'm working with a phased diploid assembly and would like to implement an elementary variant calling strategy. My goal is to make observational calls based on the assumption that most positions should have a coverage depth of 2 and that the alignment is correct. Here's my current approach:

If the reference allele is A, I'd like to report A/T as a 0/1 GT. I'm currently experimenting with the following command:

bcftools mpileup -f ../../ref.fa H5A2.dual.sorted.bam | bcftools call -m -A

My initial observations suggest it might work as I want. But I fear some under-the-hood filtering due to the low coverage.

Can you confirm whether my understanding of this command's behavior is correct, given my intended goal?

I asked about this on Biostars, but no one came back to me after 20 days.

Thanks so much for any help.

pd3 commented 3 months ago

When the -A option is added and -v is not used, this tells the call command to keep all sites, including non-variant sites, and to keep all observed alleles in the ALT column, even if the genotype is not recognized as variant. So the genotype can be 0/0, but the ALT column will still carry an evidence for the other observed alleles. Adding mpileup -a FORMAT/AD will also record the number of reads supporting each allele.

Axze-rgb commented 3 months ago

Thanks for your answer; I am encountering an "extreme value encountered."

bcftools mpileup -f ../../ref.fa -a FORMAT/AD H5A2.dual.sorted.bam|bcftools call -m -A -v > H5A2.pileup
[mpileup] 1 sample in 1 input file
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] maximum number of reads per input file set to -d 250
[W::vcf_parse_format_fill5] Extreme FORMAT/AD value encountered and set to missing at Chrom_3:144644
[1]    2315323 killed     bcftools mpileup -f ../../ref.fa -a FORMAT/AD H5A2.dual.sorted.bam | 
       2315324 done       bcftools call -m -A -v > H5A2.pileup

When looking into Tablet for that position, I see nothing special, it's just the two contigs mapping there.

image

I don't understand what "extreme" is there.

Axze-rgb commented 3 months ago

Is bcftools not suitable for my application? If not, do you have any alternative suggestions? EDIT: it seems a good old samtools mpileup could actually give me what I want. Thanks

pd3 commented 3 months ago

Any chance you could provide a small test case to reproduce the problem? It is concerning that bcftools mpileup should produce a VCF that cannot be parsed by bcftools call

[W::vcf_parse_format_fill5] Extreme FORMAT/AD value encountered and set to missing at Chrom_3:144644