hzi-bifo / RiboDetector

Accurate and rapid RiboRNA sequences Detector based on deep learning
GNU General Public License v3.0
96 stars 16 forks source link

Chunk size question #27

Closed caity-s closed 1 year ago

caity-s commented 2 years ago

Dear Zhi-Luo,

I am giving our servers a workout. I have... a 2 x 49 Gbp paired end file (about 388 million pairs of fastp processed 150 PE data). R1 is 18 GB compressed. I have been running Ribodetector on a 768 RAM 72 CPU linux server no problems, but ran into a memory error for this file - which... I am sure is just because this file is huge. I think technically it should be able to run fine at 5 * 49 Gbp = 245 (x2 for the R2 file = 490 GB memory needed) - but I assume there is some background on the machine that interrupts it.

Regarding the chunk size selection for the CPU method - if I do 6000 with 60 threads, is that approximate to 360 GB RAM?

Thank you, Caitlin

dawnmy commented 2 years ago

Dear Caitlin,

but ran into a memory error for this file

Did you set the chunk_size when you ran this file? Yes, this file is too large for your memory if you don't set a proper chunk size. You can just set a chunk_size of 256 then it should work.

Regarding the chunk size selection for the CPU method - if I do 6000 with 60 threads, is that approximate to 360 GB RAM?

If your read length is about 100bp it should have peak memory of ~360GB. But you don't need to set such a large chunk_size, because when the chunk_size is above 128, the speed gain is very limited. A large chunk_size won't speed up the runtime substantially but still consumes large memory. I would recommend you to set it to 256. Of source you can also set it to 512 which should also work.

Best, Zhi-Luo

caity-s commented 2 years ago

Dear Zhi-Luo,

Perfect, thanks a lot!

Caitlin

caity-s commented 1 year ago

Worked like a charm :) Thank you!