soedinglab / plass

sensitive and precise assembly of short sequencing reads
https://plass.mmseqs.com
GNU General Public License v3.0
149 stars 14 forks source link

Can not allocate index memory in DBReader #16

Closed kihyunee closed 5 years ago

kihyunee commented 5 years ago

Hi I am Kihyun,

I ran the assemble command like this: plass assemble input/I_1.fastq.gz input/I_2.fastq.gz plass_proteome/I.plass.fasta plass_tmp --threads 6 --remove-tmp-files --max-seq-len 30000

The run ended up with an error message, like this (I've erased some lengthy dir path parts {...}):

Temporary folder plass_tmp does not exist or is not a directory. Created directory plass_tmp PAIRED END MODE mergereads input/I_1.fastq.gz input/I_2.fastq.gz {...}/plass_tmp/2685330570646735821/nucl_reads -v 3

Start merging reads. Time for merging files: 0h 4m 52s 502ms Time for merging files: 0h 1m 55s 7ms

Done. Time for processing: 0h 25m 6s 66ms extractorfs {...}/plass_tmp/2685330570646735821/nucl_reads {...}/plass_tmp/2685330570646735821/nucl_6f_start --min-length 20 --max-length 45 --max-gaps 0 --contig-start-mode 1 --contig-end-mode 0 --orf-start-mode 0 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --use-all-table-starts 0 --id-offset 0 --threads 6 --compressed 0 -v 3

[=================================================================] 100.00% 100.03M 6m 55s 977ms
Time for merging files: 0h 0m 15s 98ms Time for merging files: 0h 0m 52s 8ms Time for processing: 0h 8m 53s 976ms

translatenucs {...}/plass_tmp/2685330570646735821/nucl_6f_start {...}/plass_tmp/2685330570646735821/aa_6f_start --translation-table 1 --add-orf-stop 1 -v 3 --compressed 0 --threads 6

[=================================================================] 100.00% 27.10M 11s 751ms
Time for merging files: 0h 0m 18s 606ms Time for processing: 0h 0m 37s 850ms

extractorfs {...}/plass_tmp/2685330570646735821/nucl_reads {...}/plass_tmp/2685330570646735821/nucl_6f_long --min-length 45 --max-length 32734 --max-gaps 0 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 0 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --use-all-table-starts 0 --id-offset 0 --threads 6 --compressed 0 -v 3

[=================================================================] 100.00% 100.03M 12m 28s 264ms
Time for merging files: 0h 3m 44s 399ms Time for merging files: 0h 10m 38s 798ms Can not allocate index memory in DBReader Error: extractorfs longest step died

My plass version is 16674881b65aace74ef2ae7a1120f9c9a9cdf7bb and I installed this as suggested

# latest static linux build s
 wget https://mmseqs.com/plass/plass-static_sse41.tar.gz; tar xvfz plass-static_sse41.tar.gz; export PATH=$(pwd)/plass/bin/:$PATH

I wonder what causes this error. In my guess, just because my server doesn't have enough memory to process the size of input data? Or according to that "Plass needs a CPU with at least the SSE4.1 instruction set." Buy the way, the size of input data that I used were I_1.fastq.gz 4.7G I_2.fastq.gz 5.4G

For more information on my working environment, the linux is CentOS Linux release 7.5.1804 (Core). I am not an familiar with CPU hardware so I can't be sure but according to that /proc/cpuinfo file contains "sse4_1" "sse4_2" in its lines starting with "flags : ", I assume that these CPUs do support SSE4.1?

Thanks, Kihyun

martin-steinegger commented 5 years ago

thank you for reporting this. How much memory does the computer have?

kihyunee commented 5 years ago

Hi Martin, the total memory is ~264 GB, according to

cat /proc/meminfo | grep MemTotal MemTotal: 263918056 kB

though at the time ~10 GB is available according to top (because other guys using it too).

That being said,,, sounds too low? Maybe was it simply a lack of available memory?

martin-steinegger commented 5 years ago

Yes 10GB is too low. Can you restart it if more memory is available?

kihyunee commented 5 years ago

OK I'll try it again later when the server is less crowded and tell you if it goes well or not. Thanks!

milot-mirdita commented 5 years ago

You can use --split-memory-limit 64G or something like that to limit the memory usage of the later stages of the assembly. It will not help you for the current crash, but the k-mer matching stage later would assume it is allowed to use all 264GB RAM otherwise.

kihyunee commented 5 years ago

Hi Martin and Milot I ran it again when there was ~200GB memory free, with the same command with --split-memory-limit 64G. It did work smoothly and produced around 32 times more proteins compared to what I had with DNA assembly + prodigal prediction!! So it was just the available system memory that caused the problem! Thank you for the advises.

Best, Kihyun