hzi-bifo / RiboDetector

Accurate and rapid RiboRNA sequences Detector based on deep learning
GNU General Public License v3.0
94 stars 16 forks source link

read length should be not smaller than 50, but ribo-seq reads are ~30 bp #16

Closed huguanjing closed 1 year ago

huguanjing commented 2 years ago

-l LEN, --len LEN Sequencing read length, should be not smaller than 50.

Is this correct? Ribo-seq reads are ~ 30 bp

huguanjing commented 2 years ago

To be clear, my question is whether RiboDetector can be used to detect and remove rRNA from ribo-seq samples? Thanks!

dawnmy commented 2 years ago

-l LEN, --len LEN Sequencing read length, should be not smaller than 50.

Is this correct? Ribo-seq reads are ~ 30 bp

Thank you for pointing this out. You can ignore the help message. this is just a suggestion, but you can still use it for reads shorter than 50 or 40bp. Yes, you can use RiboDetector for Ribo-Seq reads, however, the accuracy will be slightly lower when the reads are short (I think this will be the same for the other methods/tools). I will update the help message in the new release. Thank you!

ARW-UBT commented 2 years ago

May I add a related question: After quality filtering by trimmomatic, the (uniform) read length of e.g. 150 bp in paired end mode changes to a length distribution (e.g. 36 to 150 bases, depending on the settings). How should the -l LEN parameter used in this cases? What will happen to reads shorten than the -l value?

dawnmy commented 2 years ago

You can check the mean length by using seqkit stats, then use the mean length LEN for the -l parameter. If the read is longer than the mean, only the first LEN bases will be used to capture the sequence features for classification. If the read is shorter than or equal to the mean, the whole read will be used. In any case, the output files will give you the whole read. So you don't need to worry about the variable length of your input reads.