Train for long reads e.g. nanopore, pacbio

hzi-bifo / RiboDetector

Accurate and rapid RiboRNA sequences Detector based on deep learning

GNU General Public License v3.0

99 stars 16 forks source link

Train for long reads e.g. nanopore, pacbio #2

Open dawnmy opened 3 years ago

harrytchild commented 1 month ago

Would you recommend using the current version of RiboDetector on long read metatranscriptomic datasets?

dawnmy commented 1 month ago

I have tested it few years ago on simulated Nanopore data (simulated with high error rate of 10-15%), the recall is about 92-95%. The error rate of Nanopore data has dropped substantially in recent years, so performance on the latest real dataset should be better, though I'm not entirely sure. You could try it, but some rRNA reads may still remain.

harrytchild commented 1 month ago

Thanks for your reply. I am giving it a go now! Do you think the best setting for the -l parameter is still the mean read length for ONT reads, as these will obviously be more variable for Nanopore reads, including some much longer (3-6x) than the mean read length?

dawnmy commented 1 month ago

Good question, for long reads you don't need to set the actual mean read length for -l. You can try 200.

dawnmy commented 1 month ago

For optimal performance, I should train a model specifically for long-read data.

harrytchild commented 1 month ago

Thanks! If this is something you are still interested in doing, then please let me know the result!