voutcn / megahit

Ultra-fast and memory-efficient (meta-)genome assembler
http://www.ncbi.nlm.nih.gov/pubmed/25609793
GNU General Public License v3.0
588 stars 134 forks source link

Slow assembly for diverse metagenome #278

Open Puumanamana opened 4 years ago

Puumanamana commented 4 years ago

Hi,

I have used megahit multiple times for my analyses, and in general it works very well. Lately, I tried using it on a very diverse dataset, composed of bacteria, archaea, fungi and viruses. I have 9 samples with about 20M paired-end reads each. I'm running megahit with default parameters on a 500 GB and 60 CPU threads. However after 3 days, the assembly is still stuck at k=21. Memory is not saturated, so there's probably little swap memory used. I also checked the intermediate assembly file for k=21, and I have about 2,4 billion contigs.

I have multiple questions regarding this issue: 1) Do you know what is the bottleneck here? Is it I/O? Simply the number of CPU threads? 2) I read in issue #152 that GPU is not really supported, but I was wondering if this issue had been fixed since then 3) In general, do you have any recommendation in order to improve runtime?

Thank you for your time, Cedric