Question - Githubissues

deoliveira86 commented 2 years ago

Hello,

I just downloaded the tool, read the paper and I am keen to use the software to replace the sortmerna in my pipeline (extremely time and memory consuming). I have many dozens of single- and paired-end Illumina RNA-seq libraries (150 pb in read length) to analyse and I would like to ask some questions about the best parameter set for the SE and PE libraries.

I want to filter out the rRNA from the libraries and one important option is the read length. In my case the read length is 150 pbs, but in the paper it says that a read length of 100 bps were used and preferred. What read length should I set? Should I use the 150 bp or go with 100 bp as suggested in the manuscript?
Does it make a huge impact sorting for rRNAs after the adapter trimming? Or is it better to first remove rRNAs and then remove the adapters? I am asking because after the adapter trimming the reads will present a variable size and the read length parameter will not reflect exactly this variation in size.
Any advice for parameter sets to single- and paired-end data?

Congratulations on the tool and I am looking forward for using it.

Best, André

dawnmy commented 2 years ago

Thank you for trying RiboDetector.

I want to filter out the rRNA from the libraries and one important option is the read length. In my case the read length is 150 pbs, but in the paper it says that a read length of 100 bps were used and preferred. What read length should I set? Should I use the 150 bp or go with 100 bp as suggested in the manuscript?

You can set the read length to 150 with -l 150. The model was trained with sequences with length around 100, but it can be used for any read lengths. The longer the input read length, the more accurate the prediction.

Does it make a huge impact sorting for rRNAs after the adapter trimming? Or is it better to first remove rRNAs and then remove the adapters? I am asking because after the adapter trimming the reads will present a variable size and the read length parameter will not reflect exactly this variation in size.

Yes, you would better sort out the adaptors and trim the bad quality bases from ends. Then you can use the average length of the quality controlled reads. Variable read length is not an issue for RiboDetector, the -l parameter just tells RiboDetector how many nucleotide bases to use for the prediction (for shorter reads, the information of the whole read will be used for prediction), and the output will be the original input reads.

Any advice for parameter sets to single- and paired-end data?

If you have large input file and do not have enough memory, you can set the chunk_size parameter to reduce the memory consumption. For paired-end data, if you need to remove rRNA and extract putative mRNA for functional and taxonomy profiling, you can set --ensure rrna which will remove only high confident rRNA and keep as much as possible the putative mRNA.

I hope this helps.

deoliveira86 commented 2 years ago

Hello Dawnmy,

Thanks for the quick and complete answer. I will run RiboDetector taking into consideration your input.

Best, André

hzi-bifo / RiboDetector

Question #25