Closed Axze-rgb closed 7 months ago
Sorry I hadn't seen issue
https://github.com/lh3/minimap2/issues/1030#issue-1622249145
So, I understand that Dorado is accounted for now in the map-ont settings?
Thanks for all the work you are doing. Alex
@Axze-rgb dorado aligner has not yet changed any of the index settings and when we do we would like them upstreamed here.
For now, use map-ont. You can try -x map-hifi -w10
(HiFi scoring and k-mer length with more seeds) for Q20 reads but you need to have a way to evaluate whether that gives better results.
I hope I can find some time in the next several months to improve minimap2 a little bit. Along this I will be testing alternative scoring for v14 data.
@iiSeymour When you find more appropriate parameters for aligning Q20 reads, I will be happy to add a new preset for that. This will also save me some time. Thanks!
For now, use map-ont. You can try
-x map-hifi -w10
(HiFi scoring and k-mer length with more seeds) for Q20 reads but you need to have a way to evaluate whether that gives better results.I hope I can find some time in the next several months to improve minimap2 a little bit. Along this I will be testing alternative scoring for v14 data.
hello, recently I've been dealing with some R10 data and I want to know if there are any plans to make some improvements of minimap2 on ONT R10 in the next few months? Or any new suggestions for R10 data?
@lh3 from our internal benchmarking we find speed and downstream accuracy are maximized with -x map-ont -k19 -w 19 -U50,500 -g10k
.
For now, use map-ont. You can try
-x map-hifi -w10
(HiFi scoring and k-mer length with more seeds) for Q20 reads but you need to have a way to evaluate whether that gives better results.I hope I can find some time in the next several months to improve minimap2 a little bit. Along this I will be testing alternative scoring for v14 data.
Hi @lh3, accuracy of ONT sequencing has advanced a lot from duplex or R10.4 pore. I also wonder if there is any plan for setting different preset for R9 and R10 nanopore? And also different basecallers have significant impact on sequencing accuracy, it seem unappropriate to just mixed in -x map-ont
.
from our internal benchmarking we find speed and downstream accuracy are maximized with
-x map-ont -k19 -w 19 -U50,500 -g10k
.
-x map-hifi
is equivalent to -x map-ont -k19 -w 19 -U50,500 -g10k -A1 -B4 -O6,26 -E2,1 -s200
. The main difference here is the scoring. How scoring affects the downstream tools? If the map-hifi scoring also works, I can add an alias to map-hifi, something like lr:hq
.
also different basecallers have significant impact on sequencing accuracy
That is why it is more appropriate to choose a conservative setting that can give you good results on input of varying quality.
If the map-hifi scoring also works
Unfortunately not, the map-hifi
scoring leads to both fewer mapped reads (~3%) and small regressions in SNP/INDEL calling. It's possible these regressions could be recovered from new models trained on updated scoring parameters but it seems -x map-ont -k19 -w 19 -U50,500 -g10k
is the sweet spot.
The next release will have a lr:hq
preset for -k19 -w 19 -U50,500 -g10k
.
Thanks @lh3 !
I understand that the new preset lr:hq
is not meant for spliced alignment.
Should I use the existing preset splice:hq
with highly accurate Nanopore cDNA reads? (with average quality >= 20)
Yes
I will hijack the thread and ask a question here: are there public Q20 cDNA-seq data? Perhaps because the SQK-PCS114 kit still at the early-access stage, most cDNA reads in papers were produced with R9 or older kits.
Hi @lh3, I have PacBio HiFi Iso-Seq data, should I use the existing preset splice:hq
along with the new preset lr:hq
, or I can just use -k19 -w 19 -U50,500 -g10k -xsplice -C5 -O6,24 -B4
?
Thanks a lot.
The next release will have a
lr:hq
preset for-k19 -w 19 -U50,500 -g10k
.
Shouldn't it be
-x map-ont -k19 -w 19 -U50,500 -g10k
? According to @iiSeymour
@iiSeymour I noticed the latest Minimap2-2.27 (r1193)
includes an updated lr:hq
preset. I conducted a small benchmark between this new preset and the old map-ont
preset on a human R10.4.1 database using dorado 0.4.1 in HAC mode.
For -x map-ont
:
19072496 + 0 mapped (99.93% : N/A) 12791592 + 0 primary mapped (99.90% : N/A)
For -x lr:hq
:
18636130 + 0 mapped (99.79% : N/A) 12765068 + 0 primary mapped (99.69% : N/A)
It appears that there are fewer mapped reads (~0.14%) with the new lr:hq preset. Considering the relatively high coverage (>50X) of this data, this difference could be significant.
Read count-based metrics are often misleading. The difference mostly comes from short reads and low-quality reads that may interfere with analyses on the contrary. PS: also, not all reads are supposed to get mapped to a reference genome.
The next release will have a
lr:hq
preset for-k19 -w 19 -U50,500 -g10k
.Shouldn't it be
-x map-ont -k19 -w 19 -U50,500 -g10k
? According to @iiSeymour
Thanks a lot. @jelber2 splice:hq
works for RNA and lr:hq
works for DNA.
preset lr:hq => -x map-ont -k19 -w 19 -U50,500 -g10k
preset splice:hq => -x splice -C5 -O6,24 -B4
preset splice => -x map-ont -k15 -w5 --splice -g2k -G200k -U10,1000000 -A1 -B2 -O2,32 -E1,0 -b0 -C9 -z200 -ub --junc-bonus=9 --cap-sw-mem=0 --splice-flank=yes
So parameters from lr:hq and splice:hq will cause conflicts.
Hello,
@lh3 and @iiSeymour, as far as I understood, splice:hq
is the best option for R10 Nanopore cDNA reads.
Would it be optimal also for the new RNA004 ?
In other words, which setting would you use to optimally align reads from the new RNA pore to a genomic and a transcriptomic reference?
Thank you for your time
Also, if provided a --junc-bed file, would this have any conflict with the splice:hq options?
@camillaugolini-iit Using the --junc-bed
option, minimap2 prioritizes splicing events based on the provided annotations. It will not cause any conflict with splice:hq
options.
Hello,
I have a question: according to Oxford nanopore their last cells produce very accurate reads. Does "map-ont" still work as the best setting to map those reads? I am asking because the manual still refers to "long noisy reads". Thanks for minimap2 and for your time.