cpu vs. gpu performance

hzi-bifo / RiboDetector

Accurate and rapid RiboRNA sequences Detector based on deep learning

GNU General Public License v3.0

96 stars 16 forks source link

cpu vs. gpu performance #17

Closed ARW-UBT closed 2 years ago

ARW-UBT commented 2 years ago

Hello, I would like to detect rRNA reads in Illumina data sets consisting of over 100 million reads (meta-RNA-seq data in 150 paired-end mode). I have seen the performance graphs in tha paper with respect to cpu or gpu use. Unfortunately, I have only 40 cpus on a linux system, the gpu machine is running windows 11. I would like to optimize the performance using the cpu mode: I read in the paper that the chunk size does mainly affect memory, but not run time, right? If I am interested only in the nonrRNA reads, would the '-e rrna' option influence the runtime comparted to '-e both'? Are there other option that should be set in order to optimize runtime?

Best regards

dawnmy commented 2 years ago

Thank you for using RiboDetector. One sample with 100 million reads is quite a lot. How large is your memory? Yes chunk size does not affect much on the runtime. You can use -t 38 as you have 40 cores. Do you have paired end data? Single end data do not need -e parameter. If you are interested in non-rRNA, '-e rrna' is the right setting. There is no other option to improve the runtime. Hope my answer helps.

ARW-UBT commented 2 years ago

Thank you for your comments. The jobs using paired-end data completed after 6-7 hrs with -t 40 and 500 GB RAM, but there are still a few sample to process. I was not aware that the -e parameter (as well as the --chunk_size) can be omitted in some cases. May I propose that you add a 'required' or 'optional' to each of the parameters in the --help output to indicate this.

Thank you again for that great tool, I see clear differences in the output between ribodetector and sortmerna.

dawnmy commented 2 years ago

Thank you for your suggestion. -e is not really required, if it is not specified, the software will set a default value for it which is -e none. --chunk_size is also not a required parameter, it is needed only when you don't have enough memory and input file is very large. I am curious how long did it take for SortMeRNA. For RiboDetector CPU mode, 100M paired end reads should take about 4-5 hours with 40 cores. But 6 hours is also acceptable because the IO might be the bottleneck. Have you checked how many reads are actually different between RiboDetector output and SortMeRNA. with -e rrna, you will have a bit more reads in the output.

ARW-UBT commented 2 years ago

Thanks for the comments on some parameters. I did not check performance of SortMeRNA compared to RiboDetector due to the data size, but on a second (smaller) meta-transcriptome dataset of two samples with ca. 3 Mio pe reads each. The differences were large, ca. 20-30% rRNA reads detected by RiboDetector compared to 80% putative rRNA reads detected by SortMeRNA. A subsequent taxonomic classification (qiime2 shogun), however, detects almost the same number of taxa (based on silva 138.1 references) in both dataset, and I assume that SortMeRNA classifies much more reads as rRNA (same result as in you paper: '6x'), which were later sorted out by shogun because they do not map to 16S loci. I did also a 'positive' test using reads of known composition: The rRNA classification of poly-A derived RNA-Seq data as well as of 'true' 16S data (Amplicon-Seq) is, however, the same in RiboDetector and SortMeRNA (ca. 5% and 98%, respectively). These results were expected.