soedinglab / plass

sensitive and precise assembly of short sequencing reads
https://plass.mmseqs.com
GNU General Public License v3.0
149 stars 14 forks source link

mem or disk issue? #4

Closed colindaven closed 5 years ago

colindaven commented 5 years ago

Current Behavior

Plass died. I am unsure whether this is due to a RAM issue or tmp space issue. Server: 512GB Ubuntu1604.

Failed to mmap memory dataSize=0 File=/tmp/6803214812655189031/nucl_6f_long. Error 22.

Thanks

Steps to Reproduce (for bugs)

srun -c 48 /mnt/ngsnfs/tools/plass/plass/bin/plass assemble --threads 48 MBCF_117_S38_R1.fastq out.fa /tmp/

Plass Output (for bugs)

Program call: assemble --threads 48 MBCF_117_S38_R1.fastq out.fa /tmp/

MMseqs Version: 26b5d6625a2fbef4cfaab4bfaa99b1682d35921c Sub Matrix blosum62.out Rescore mode 0 Remove hits by seq.id. and coverage false E-value threshold 1e-05 Coverage threshold 0 Coverage Mode 0 Seq. Id Threshold 0.9 Seq. Id. Mode 0 Include identical Seq. Id. false Sort results 0 In substitution scoring mode, performs global alignment along the diagonal false Preload mode 0 Threads 48 Verbosity 3 Alphabet size 13 Kmer per sequence 60 Mask Residues 0 K-mer size 14 Max. sequence length 65535 Shift hash 5 Split Memory Limit 0 Include only extendable true Skip sequence with n repeating k-mers 8 Min codons in orf 45 Max codons in length 2147483647 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 0 Forward Frames 1,2,3 Reverse Frames 1,2,3 Translation Table 1 Use all table starts false Offset of numeric ids 0 Protein Filter Threshold 0.2 Filter Proteins 1 Number search iterations 12 Remove Temporary Files false Sets the MPI runner

Program call: createdb MBCF_117_S38_R1.fastq /tmp/6803214812655189031/nucl_reads --max-seq-len 65535 --dont-split-seq-by-len 0 --dont-shuffle 1 --id-offset 0 -v 3

MMseqs Version: 26b5d6625a2fbef4cfaab4bfaa99b1682d35921c Max. sequence length 65535 Split Seq. by len false Do not shuffle input database true Offset of numeric ids 0 Verbosity 3

................................................................................................... 1 Mio. sequences processed ................................................................................................... 2 Mio. sequences processed ................................................................................................... 3 Mio. sequences processed ................................................................................................... 4 Mio. sequences processed ................................................................................................... 5 Mio. sequences processed ................................................................................................... 6 Mio. sequences processed ................................................................................................... 7 Mio. sequences processed ................................................................................................... 8 Mio. sequences processed ................................................................................................... 9 Mio. sequences processed ................................................................................................... 10 Mio. sequences processed ................................................................................................... 11 Mio. sequences processed ................................................................................................... 12 Mio. sequences processed ................................................................................................... 13 Mio. sequences processed ................................................................................................... 14 Mio. sequences processed ................................................................................................... 15 Mio. sequences processed ................................................................................................... 16 Mio. sequences processed ...........Time for merging files: 0h 0m 2s 140ms Time for merging files: 0h 0m 2s 28ms Touch data file /tmp/6803214812655189031/nucl_reads ... Done. Time for merging files: 0h 0m 15s 353ms Touch data file /tmp/6803214812655189031/nucl_reads_h ... Done. Time for merging files: 0h 0m 15s 312ms Time for processing: 0h 1m 55s 831ms Program call: extractorfs /tmp/6803214812655189031/nucl_reads /tmp/6803214812655189031/nucl_6f_start --min-length 20 --max-length 45 --max-gaps 0 --contig-start-mode 1 --contig-end-mode 0 --orf-start-mode 0 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --use-all-table-starts 0 --id-offset 0 --threads 48 -v 3

MMseqs Version: 26b5d6625a2fbef4cfaab4bfaa99b1682d35921c Min codons in orf 20 Max codons in length 45 Max orf gaps 0 Contig start mode 1 Contig end mode 0 Orf start mode 0 Forward Frames 1,2,3 Reverse Frames 1,2,3 Translation Table 1 Use all table starts false Offset of numeric ids 0 Threads 48 Verbosity 3

................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ 16 Mio. sequences processed ................................................................................................................................................................................................................................................................................................................................................................................................................................. 10 Mio. sequences processed ..... 8 Mio. sequences processed ................................................................................................................................................. 14 Mio. sequences processed .. 15 Mio. sequences processed . 13 Mio. sequences processed ....... 7 Mio. sequences processed ...................................... 11 Mio. sequences processed ........... 12 Mio. sequences processed ............................................................................................ 9 Mio. sequences processed ................................ 6 Mio. sequences processed ........................ 5 Mio. sequences processed ...... 1 Mio. sequences processed .......................................... 3 Mio. sequences processed .................... 2 Mio. sequences processed 4 Mio. sequences processed .................................Time for merging files: 0h 0m 0s 96ms Time for merging files: 0h 0m 0s 95ms Time for processing: 0h 0m 5s 85ms Program call: translatenucs /tmp/6803214812655189031/nucl_6f_start /tmp/6803214812655189031/aa_6f_start --translation-table 1 --add-orf-stop 1 -v 3 --threads 48

MMseqs Version: 26b5d6625a2fbef4cfaab4bfaa99b1682d35921c Translation Table 1 Add Orf Stop true Verbosity 3 Threads 48

...............................Time for merging files: 0h 0m 0s 202ms Time for processing: 0h 0m 0s 452ms Program call: extractorfs /tmp/6803214812655189031/nucl_reads /tmp/6803214812655189031/nucl_6f_long --min-length 45 --max-length 2147483647 --max-gaps 0 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 0 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --use-all-table-starts 0 --id-offset 0 --threads 48 -v 3

MMseqs Version: 26b5d6625a2fbef4cfaab4bfaa99b1682d35921c Min codons in orf 45 Max codons in length 2147483647 Max orf gaps 0 Contig start mode 2 Contig end mode 2 Orf start mode 0 Forward Frames 1,2,3 Reverse Frames 1,2,3 Translation Table 1 Use all table starts false Offset of numeric ids 0 Threads 48 Verbosity 3

............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ 16 Mio. sequences processed ............................................................................................................................................................................................................................................................................................................................................................................................. 3 Mio. sequences processed ........................... 14 Mio. sequences processed ........................................... 15 Mio. sequences processed .................................................... 13 Mio. sequences processed ......................................... 2 Mio. sequences processed ..... 11 Mio. sequences processed ............................... 9 Mio. sequences processed ............................................................... 8 Mio. sequences processed ..... 5 Mio. sequences processed ............................ 6 Mio. sequences processed .................... 10 Mio. sequences processed .......................................................................................................... 12 Mio. sequences processed ................ 1 Mio. sequences processed .. 7 Mio. sequences processed ..................................... 4 Mio. sequences processed ......Time for merging files: 0h 0m 0s 1ms Time for merging files: 0h 0m 0s 1ms Time for processing: 0h 0m 4s 905ms Program call: translatenucs /tmp/6803214812655189031/nucl_6f_long /tmp/6803214812655189031/aa_6f_long --translation-table 1 --add-orf-stop 1 -v 3 --threads 48

MMseqs Version: 26b5d6625a2fbef4cfaab4bfaa99b1682d35921c Translation Table 1 Add Orf Stop true Verbosity 3 Threads 48

Failed to mmap memory dataSize=0 File=/tmp/6803214812655189031/nucl_6f_long. Error 22. Error: translatenucs long step died srun: error: hpc-rc03: task 0: Exited with exit code 1

milot-mirdita commented 5 years ago

How does your input data look? What is the average read length?

This error can happen, when the ORF extraction module was not able to extract a single ORF, due to the minimum ORF cutoff.

If your reads are only 100 residues long, then you should use an lower cutoff (something like --min-length 30).

martin-steinegger commented 5 years ago

We intend to fix this in the next release by taking always a fraction of the sequence length as cutoff for the orf extraction.

colindaven commented 5 years ago

Thanks, that got me a lot further. The reads were only 1x75bp. I selected minimum ORF --min-length 20 and got a lot further.

Thanks!

martin-steinegger commented 5 years ago

The sensitivity of Plass can suffer from such short reads because we compute an e-value for the overlap. It is difficult to be significant which such short fragments.