Closed StephDC closed 5 months ago
Hi @StephDC, I'm afraid that Dorado Duplex basecalling only works for R10.4.1. I understand this is likely frustrating, but the reason for this is that the current duplex algorithms were developed and tested specifically with R10.4.1 Our development effort is focused on continuing to improve R10 yield and accuracy.
Thanks for the info.
By the way, in order to avoid the future confusion, would you mind to add, or accept a PR to add, the list of model - kit table to the README.md, in the Available basecalling models section under RNA Models?
Thank you for the suggestion @StephDC - this seems like a good idea - we will aim to add it for the next release.
A duplex basecalling approach seems to be promising. A lot of labs produced data on r9.4.1 in years since the pore was introduced. I believe the amount of such data produced to be greater than that of 10.*. And it should also be noticed that a meaningful part of the data were not published yet because of the pure quality and shortcomings in processing. If you update basecalling models and introduce native official models for duplex basecalling for r9.4.1 data, all these researches would be finished and yielded in publications. As a side effect, it would improve the consensus opinion on the quality of data produced by ONT instruments in the wide community of genome researchers.
Thus, I don't see it be out of the priority list. One can use duplex_tools
but researchers like default solutions that work "out-of-the-box". I still hope, one day the models for r9.4.1 series products will be updated.
Best regards
Asan
Issue Report
Please describe the issue:
I have a DNA sample sequenced with R9.4.1 that I wonder if I could do duplex basecalling on it or not.
According to the current model selection source code, it seems that duplex only works for all 4 conditions of R10.4.1. Are there any plan to implement duplex basecalling for R9.4.1?
https://github.com/nanoporetech/dorado/blob/release-v0.6.0/dorado/models/models.cpp#L414-L440
If not, I would like add a list to the currently supported duplex basecalling models on the README.md. An explanation on why such model is not provided / not possible would be greatly appreciated.
Steps to reproduce the issue:
Run the following command on a run that uses R9.4.1
dorado duplex sup R9.4.1/pod5/ > output.bam
And you would be greeted with an error message saying no model available.
Run environment:
dorado duplex sup R9.4.1/pod5/ > output.bam
SQK-LSK109
FLO-PRO002
Logs