Open dkj opened 6 years ago
It dawned on me we can do a poor mans threading here using calmd | mpileup. Eg:
$ time bcftools mpileup --threads 2 --fasta-ref $HREF -Ou ~/scratch/data/9827_2#49.1m.bam > /dev/null
[mpileup] 1 samples in 1 input files
real 0m18.121s
user 0m17.889s
sys 0m0.180s
$ time bcftools mpileup --threads 2 --fasta-ref $HREF -Ou -B ~/scratch/data/9827_2#49.1m.bam > /dev/null
[mpileup] 1 samples in 1 input files
real 0m13.964s
user 0m13.717s
sys 0m0.204s
Two threads here is simply to ensure asynchronous I/O and decoding of BAM, but I doubt thats's the bottleneck in bcftools anyway. This also shows (albeit on a tiny sample) the BAQ is about 1/3rd the cpu cost so that's how much we'd gain at most with threading this part.
$ time samtools calmd -@2 -u -E -r ~/scratch/data/9827_2#49.1m.bam $HREF | bcftools mpileup --threads 2 --fasta-ref $HREF -Ou - > /dev/null
[mpileup] 1 samples in 1 input files
real 0m15.031s
user 0m21.517s
sys 0m0.400s
It's not quite as fast, but calmd was running at about 50% CPU and bcftools at 100% so it's managing to remove most of the BAQ overhead into a second process. I checked the md5sum (when in VCF mode) before and after.
Thanks to an office discussion (with @dkj) this brings ideas for the multi-sample bcftools too.
Compare (fake 10x 1 sample, but go with it...):
time bcftools mpileup --threads 2 --fasta-ref $HREF ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam | bcftools call -vmO v -o _1.vcf
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] 1 samples in 10 input files
real 1m23.977s
user 1m34.834s
sys 0m0.772s
vs
time bcftools mpileup --threads 2 --fasta-ref $HREF -Ou <(samtools calmd -E -@2 -r -A -u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r -u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF)| bcftools call -vmO v -o _3.vcf
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] 1 samples in 10 input files
real 0m28.917s
user 0m28.682s
sys 0m0.472s
The huge difference now makes me realise my previous test was invalid to some extent as it was shallow data. On deep data BAQ becomes dominant as it's per read, but the mpileup speed scales sub-linearly with read depth (it has a linear component plus a fixed portion, which is dominant at low depths).
So there is potential merit here still; even if it's just to avoid horrendous shell hackery!
If the BAQ calculation is a significant compute fraction of a variant calling pipeline can we get a faster turnaround by threading that BAQ calculation? (be it in
samtools calmd
,samtools mpileup
, orbcftools mpileup