rkajitani / MetaPlatanus

De novo metagenome assembler
GNU General Public License v3.0
12 stars 1 forks source link

metaplatanus runs forever #4

Open jsgounot opened 2 years ago

jsgounot commented 2 years ago

Hi,

I'm trying metaplatanus (version 1.3.0) on an AWS instance with 4 cores for now 32 hours and I'm still stuck at step 1. Despite providing 4 cores, I observed that most of the time only one core is used. I use 11*2 Gb Illumina reads and 2.2 Gb Nanopore reads.

Do you have an idea of how long the run could take? Would it be possible to have a verbose option to have a better idea of where the current software is and what it's doing?

Regards, JS

Unaimend commented 2 years ago

Did you look and the output of htop? I have seen that metaplatanus does not really use all cores all the time, maybe there is an issue with multithreading Edit: does -> doesn't

jsgounot commented 2 years ago

Actually, I discovered it mostly used one core with htop.

rkajitani commented 2 years ago

Thank you for the report. For a human gut sample with a similar data size (Illumina 12 Gb, ONT 1.6 Gb), it took 6.4 hours of real time and 39.1 hours of CPU time with 24 threads (-t 24). I recognize that MetaPlatanus could be extremely slow when the memory limit (-m) is small, and I am fixing the problem. As a temporary solution, specifying a large -m value (e.g., -m 100) is effective to speed up MetaPlatanus.

jsgounot commented 2 years ago

Ok. Does metaplatanus reload multiple times reads files ? I have my reads on a slow access HDD and this might be an issue if the files have to be load multiple times.

rkajitani commented 2 years ago

Yes, MetaPlatanus reads input FASTQs (FASTAs) multiple times. Note that it reads the inputs only once in the contig-assembly step (step 1).

jsgounot commented 2 years ago

I tried again with more memory (60G) and 32 cores but the situation remains the same. That's very inconvenient since I have to pay for this AWS instance. Do you have an idea of how to resolve this situation ?

image