samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
663 stars 240 forks source link

PacBio data variant calling problem with mpileup -- (Illumina and Oxford data runs without problem) #1863

Open hoyonh opened 1 year ago

hoyonh commented 1 year ago

My problem is that I am having difficulty obtaining a list of variants using bcftools mpileup and bcftools call with PacBio data. I do not have this problem with data of Illumina paired-end sequencing and Oxford Nanopore sequencing, which gets processed quickly and satisfactorily (~8 min with ont data (no indels) and a bit longer with a larger set of Illumina data - maybe 30 min).

In comparison, PacBio data could run for 24+ hours without finishing. Worse, I get empty output with three different samples.

I use the same reference genome for all samples. All alignments were done using minimap2 - both using default configuration (-x ont?) as well as pacbio (-x pacbio) with Pac Bio data and using other appropriate configuration with Illumina and Oxford data.

I tried bcftools mpileup -X pacbio-css first and then X pacbio-ccs -Ou --skip-indels ... before trying other configurations even venturing to -X ont as well as default - none of which helped.

Any suggestions?

pd3 commented 1 year ago

What version of bcftools are you using? If it's the latest, any chance you could provide a test file for debugging?

hoyonh commented 1 year ago

I am using bcftools 1.15.1

The three BAM files I used are quite big - smallest is 8.8 GB, so I'm not sure of a good way to send them to you. Fastq files are publicly available though - e.g. SRR8599837 and SRR13448197. Reference is c_elegans.PRJNA13758.WS245.genomic.fa available at WormBase via FTP. The size could be the problem - I tried and gave up with GATK. I got PacBio hifi data working with PEPPER, but I have yet to do the same with older PacBio data using Deepvariant.

pd3 commented 1 year ago

New version 1.17 was just released. Can you please try with that? If the problem persists, we'd need to reproduce the problem locally with your BAM to explain what is the root of the problem.

frsepulveda commented 2 months ago

Hi there! do you find some solution to call SNPs from PacBio data? I have the same trouble as you. When my call finish, I have an empty .vcf file without variants. I have the 1.17 version.