jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
374 stars 80 forks source link

Memory requirements for 51 metagenomes #439

Closed archanamadhav closed 2 years ago

archanamadhav commented 2 years ago

Hello!

I'm running the SqueezeMeta pipeline on a set of 51 gut microbiomes on merged mode and I have run into a lot of trouble with getting through individual steps of the pipeline. I am currently running these samples on a computing cluster on a single node with 56 cpus with 3GB memory per CPU but a runtime limit of 36 hours. I was only able to get through the first (assembly) step of the pipeline once, with 35 threads allocated, but the same did not work when I allocated the full 56 threads. In addition, even when I allocated 35 threads and the first step of the pipeline worked, I was not able to pass all 51 metagenomics through step 1.5 (merge assemblies).

I'm trying to figure out how to maximise efficiency of memory usage. Would seqmerge mode work better? Any suggestions to modify run parameters to make this work better?

STEP 1 : ASSEMBLY (56 threads) State: TIMEOUT (exit code 0) Nodes: 1 Cores per node: 56 CPU Utilized: 67-05:41:56 CPU Efficiency: 82.33% of 81-16:03:44 core-walltime Job Wall-clock time: 1-11:00:04 Memory Utilized: 8.90 GB Memory Efficiency: 4.76% of 187.03 GB

STEP 1.5 : MERGE ASSEMBLIES (35 threads) State: TIMEOUT (exit code 0) Nodes: 1 Cores per node: 56 CPU Utilized: 27-09:47:22 CPU Efficiency: 33.56% of 81-16:00:00 core-walltime Job Wall-clock time: 1-11:00:00 Memory Utilized: 54.37 GB Memory Efficiency: 29.07% of 187.03 GB

fpusan commented 2 years ago

It is hard to estimate. Seqmerge would help, yes, but the improvement may not be dramatic. However the problem seems to be a timeout... wouldn't it just be enough to increase the runtime limit?