tyjo / coptr

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing
GNU General Public License v3.0
16 stars 5 forks source link

minimap2 instead of bowtie2? #2

Closed nick-youngblut closed 3 years ago

nick-youngblut commented 3 years ago

Given that minimap2 is 2-3X faster than bowtie2 for mapping Illumina reads, I'd prefer to use minimap2. Have you done any testing to compare minimap2 versus bowtie2 in regards to the accuracy of CoPTR?

tyjo commented 3 years ago

I looked into minimap2 using the example data. It appears to produce fewer alignments than bowtie2. Here are the number of mapped reads from each aligner:

bowtie2

minimap2 (using the -x sr flag for short reads)

Note that CoPTR does filter reads after alignment. Even with the filtering step, bowtie2 produces more alignments.

Given the discrepancy, I would be hesitant to use minimap2 with CoPTR.

nick-youngblut commented 3 years ago

Good to know! Thanks for the info

cdiener commented 3 years ago

Hey, I think the discrepancies you see are due to minimap2 not returning secondary alignments by default. You get much better agreement with --secondary=yes (about 10% more mappings). For example:

minimap2 -ax sr --secondary=yes ref-db/db.fna fastq/ERR969281.fastq.gz | samtools view -c -F 4
[M::mm_idx_gen::0.334*0.99] collected minimizers
[M::mm_idx_gen::0.396*1.30] sorted minimizers
[M::main::0.396*1.30] loaded/built the index for 190 target sequence(s)
[M::mm_mapopt_update::0.396*1.30] mid_occ = 1000
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 190
[M::mm_idx_stat::0.415*1.28] distinct minimizers: 1070677 (99.05% are singletons); average occurrences: 1.015; average spacing: 6.006; total length: 6527538
[M::worker_pipeline::0.533*1.41] mapped 10818 sequences
[M::main] Version: 2.18-r1015
[M::main] CMD: minimap2 -ax sr --secondary=yes ref-db/db.fna fastq/ERR969281.fastq.gz
[M::main] Real time: 0.547 sec; CPU: 0.765 sec; Peak RSS: 0.057 GB
6049

And you can get more mappings when lowering k and w:

minimap2 -ax sr --secondary=yes -k 15 -w 10 ref-db/db.fna fastq/ERR969281.fastq.gz | samtools view -c -F 4
[M::mm_idx_gen::0.412*1.00] collected minimizers
[M::mm_idx_gen::0.518*1.40] sorted minimizers
[M::main::0.519*1.40] loaded/built the index for 190 target sequence(s)
[M::mm_mapopt_update::0.519*1.40] mid_occ = 1000
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 190
[M::mm_idx_stat::0.542*1.38] distinct minimizers: 1183441 (97.71% are singletons); average occurrences: 1.030; average spacing: 5.354; total length: 6527538
[M::worker_pipeline::0.676*1.55] mapped 10818 sequences
[M::main] Version: 2.18-r1015
[M::main] CMD: minimap2 -ax sr --secondary=yes -k 15 -w 10 ref-db/db.fna fastq/ERR969281.fastq.gz
[M::main] Real time: 0.695 sec; CPU: 1.066 sec; Peak RSS: 0.066 GB
7149

This is a bit more consistent with results from the literature where minimap2, bwa-mem and bowtie tend to perform similarly. However, AFAICT you can use any bam with coptr so I don't see an issue to default to bowtie which is more probably more sensitive and can be pretty memory efficient with the --mm flag.