Open dawnmy opened 3 years ago
I have tested it few years ago on simulated Nanopore data (simulated with high error rate of 10-15%), the recall is about 92-95%. The error rate of Nanopore data has dropped substantially in recent years, so performance on the latest real dataset should be better, though I'm not entirely sure. You could try it, but some rRNA reads may still remain.
Thanks for your reply. I am giving it a go now! Do you think the best setting for the -l parameter is still the mean read length for ONT reads, as these will obviously be more variable for Nanopore reads, including some much longer (3-6x) than the mean read length?
Good question, for long reads you don't need to set the actual mean read length for -l
. You can try 200.
For optimal performance, I should train a model specifically for long-read data.
Thanks! If this is something you are still interested in doing, then please let me know the result!
Would you recommend using the current version of RiboDetector on long read metatranscriptomic datasets?