rrwick / Unicycler

hybrid assembly pipeline for bacterial genomes
GNU General Public License v3.0
547 stars 131 forks source link

Memory issue #117

Open metanav opened 6 years ago

metanav commented 6 years ago

Command line: /usr/local/bin/spades.py -1 /app/data/R1.fastq -2 /app/data/R2.fastq -o /app/data/output/spades_assembly/read_correction --only-error-correction

System information: SPAdes version: 3.12.0 Python version: 2.7.15 OS: Linux-3.10.0-514.10.2.el7.x86_64-x86_64-with-Ubuntu-18.04-bionic

I am getting following error. 0:17:11.642 3G / 18G ERROR K-mer Counting (kmer_data.cpp : 353) The reads contain too many k-mers to fit into available memory. You need approx. 303.601GB of free RAM to assemble your dataset

I tried with default threads, "-t 16" and "-t 32" but same error. The params.txt shows "Memory limit (in Gb): 250".

My linux box has 1 TB memory. How can I pass needed memory params to unicycler or set it somewhere?

rrwick commented 6 years ago

You'd have to modify Unicycler's SPAdes call in its source code. These are the relevant lines. It would need --memory 1000 or something like that added on. Alternatively, you could run error correction separately and then run Unicycler with --no_correct.

But why does it need so much memory in the first place?! How big is your read set? How big is the genome you're assembling? Don't forget that Unicycler is really just for bacterial isolates, so if you're trying to assemble a large genome, you'll probably run into lots of other problems beside SPAdes' memory usage.

Ryan