Slow assembly for diverse metagenome

Hi,

I have used megahit multiple times for my analyses, and in general it works very well. Lately, I tried using it on a very diverse dataset, composed of bacteria, archaea, fungi and viruses. I have 9 samples with about 20M paired-end reads each. I'm running megahit with default parameters on a 500 GB and 60 CPU threads. However after 3 days, the assembly is still stuck at k=21. Memory is not saturated, so there's probably little swap memory used. I also checked the intermediate assembly file for k=21, and I have about 2,4 billion contigs.

I have multiple questions regarding this issue: 1) Do you know what is the bottleneck here? Is it I/O? Simply the number of CPU threads? 2) I read in issue #152 that GPU is not really supported, but I was wondering if this issue had been fixed since then 3) In general, do you have any recommendation in order to improve runtime?

Thank you for your time, Cedric

voutcn / megahit

Slow assembly for diverse metagenome #278