soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
MIT License
1.47k stars 200 forks source link

existing tmp dir causes mmseqs map to die #144

Closed nick-youngblut closed 5 years ago

nick-youngblut commented 5 years ago

I'm running mmseqs map, using a simple test run command: mmseqs map --threads 24 -s 2 -c 0 MY_queryDB MY_targetDB MY_outDB MY_tmp_dir, and I'm getting the following output + error:

mseqs map --threads 24 -s 2 -c 0 /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_db /tmp/global2/nyoungblut/LLMGA_27929269397/linclust/genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_v_genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/tmp/
Program call:
map --threads 24 -s 2 -c 0 /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_db /tmp/global2/nyoungblut/LLMGA_27929269397/linclust/genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_v_genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/tmp/

MMseqs Version:                                                             7.4e23d
Sub Matrix                                                                  blosum62.out
Sensitivity                                                                 2
K-mer size                                                                  0
K-score                                                                     2147483647
Alphabet size                                                               21
Max. sequence length                                                        65535
Max. results per query                                                      300
Offset result                                                               0
Split DB                                                                    0
Split mode                                                                  2
Split Memory Limit                                                          0
Coverage threshold                                                          0
Coverage Mode                                                               2
Compositional bias                                                          0
Diagonal Scoring                                                            1
Exact k-mer matching                                                        0
Mask Residues                                                               0
Minimum Diagonal score                                                      15
Include identical Seq. Id.                                                  false
Spaced Kmer                                                                 1
Preload mode                                                                0
Pseudo count a                                                              1
Pseudo count b                                                              1.5
Spaced k-mer pattern
Local temporary path
Threads                                                                     24
Verbosity                                                                   3
Rescore mode                                                                2
Remove hits by seq.id. and coverage                                         false
E-value threshold                                                           0.001
Seq. Id Threshold                                                           0.9
Seq. Id. Mode                                                               0
Sort results                                                                1
In substitution scoring mode, performs global alignment along the diagonal  false
Min codons in orf                                                           1
Max codons in length                                                        2147483647
Max orf gaps                                                                2147483647
Contig start mode                                                           2
Contig end mode                                                             2
Orf start mode                                                              0
Forward Frames                                                              1,2,3
Reverse Frames                                                              1,2,3
Translation Table                                                           1
Use all table starts                                                        false
Offset of numeric ids                                                       0
Add Orf Stop                                                                false
Start sensitivity                                                           4
Search steps                                                                1
Sets the MPI runner
Remove Temporary Files                                                      false

Program call:
search /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_db /tmp/global2/nyoungblut/LLMGA_27929269397/linclust/genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_v_genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/tmp//3961034198315058036 --sub-mat blosum62.out -s 2 -k 0 --k-score 2147483647 --alph-size 21 --max-seq-len 65535 --max-seqs 300 --offset-result 0 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 2 --comp-bias-corr 0 --diag-score 1 --exact-kmer-matching 0 --mask 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 24 -v 3 --rescore-mode 2 --filter-hits 0 -e 0.001 --min-seq-id 0.9 --seq-id-mode 0 --sort-results 1 --global-alignment 0 --min-length 1 --max-length 2147483647 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 0 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --use-all-table-starts 0 --id-offset 0 --add-orf-stop 0 --start-sens 4 --sens-steps 1 --remove-tmp-files 0 --alignment-mode 4

MMseqs Version:                                                             7.4e23d
Sub Matrix                                                                  blosum62.out
Add backtrace                                                               false
Alignment mode                                                              4
E-value threshold                                                           0.001
Seq. Id Threshold                                                           0.9
Seq. Id. Mode                                                               0
Alternative alignments                                                      0
Coverage threshold                                                          0
Coverage Mode                                                               2
Max. sequence length                                                        65535
Max. results per query                                                      300
Compositional bias                                                          0
Realign hit                                                                 false
Max Reject                                                                  2147483647
Max Accept                                                                  2147483647
Include identical Seq. Id.                                                  false
Preload mode                                                                0
Pseudo count a                                                              1
Pseudo count b                                                              1.5
Score bias                                                                  0
Gap open cost                                                               11
Gap extension cost                                                          1
Threads                                                                     24
Verbosity                                                                   3
Sensitivity                                                                 2
K-mer size                                                                  0
K-score                                                                     2147483647
Alphabet size                                                               21
Offset result                                                               0
Split DB                                                                    0
Split mode                                                                  2
Split Memory Limit                                                          0
Diagonal Scoring                                                            1
Exact k-mer matching                                                        0
Mask Residues                                                               0
Minimum Diagonal score                                                      15
Spaced Kmer                                                                 1
Spaced k-mer pattern
Local temporary path
Rescore mode                                                                2
Remove hits by seq.id. and coverage                                         false
Sort results                                                                1
In substitution scoring mode, performs global alignment along the diagonal  false
Mask profile                                                                1
Profile e-value threshold                                                   0.1
Use global sequence weighting                                               false
Filter MSA                                                                  1
Maximum sequence identity threshold                                         0.9
Minimum seq. id.                                                            0
Minimum score per column                                                    -20
Minimum coverage                                                            0
Select n most diverse seqs                                                  1000
Omit Consensus                                                              false
Min codons in orf                                                           1
Max codons in length                                                        2147483647
Max orf gaps                                                                2147483647
Contig start mode                                                           2
Contig end mode                                                             2
Orf start mode                                                              0
Forward Frames                                                              1,2,3
Reverse Frames                                                              1,2,3
Translation Table                                                           1
Use all table starts                                                        false
Offset of numeric ids                                                       0
Add Orf Stop                                                                false
Number search iterations                                                    1
Start sensitivity                                                           4
Search steps                                                                1
Run a seq-profile search in slice mode                                      false
Strand selection                                                            1
Disk space limit                                                            0
Sets the MPI runner
Remove Temporary Files                                                      false

Please provide <queryDB> <targetDB> <outDB> <tmp>
Error: Search step died

I removed the temporary directory created by my previous failed run of mmseq map, and then this Please provide <queryDB> <targetDB> <outDB> <tmp>; Error: Search step died error did not occur.

In general, I keep getting Please provide <queryDB> <targetDB> <outDB> <tmp>; Error: Search step died at various steps of the mmseqs map run (eg., at the end of translatenucs). The error is not very informative on what's going wrong, since I am providing the required input and output parameters.

conda env:

# Name                    Version                   Build  Channel
bzip2                     1.0.6                h470a237_2    conda-forge
gawk                      4.2.1                h470a237_0    conda-forge
gettext                   0.19.8.1             h5e8e0c9_1    conda-forge
libgcc-ng                 7.2.0                hdf63c60_3    conda-forge
libstdcxx-ng              7.2.0                hdf63c60_3    conda-forge
llvm-meta                 7.0.0                         0    conda-forge
mmseqs2                   7.4e23d              h21aa3a5_1    bioconda
openmp                    7.0.0                h2d50403_0    conda-forge
pigz                      2.3.4                         0    conda-forge
plass                     2.c7e35              h21aa3a5_1    bioconda
zlib                      1.2.11               h470a237_3    conda-forge

conda info

     active environment : plass
    active env location : /ebio/abt3_projects/software/dev/miniconda3_dev/envs/plass
            shell level : 1
       user config file : /ebio/abt3/nyoungblut/.condarc
 populated config files : /ebio/abt3_projects/software/dev/miniconda3_dev/.condarc
                          /ebio/abt3/nyoungblut/.condarc
          conda version : 4.5.11
    conda-build version : 3.11.0
         python version : 3.6.5.final.0
       base environment : /ebio/abt3_projects/software/dev/miniconda3_dev  (writable)
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/bioconda/linux-64
                          https://conda.anaconda.org/bioconda/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/linux-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/pro/linux-64
                          https://repo.anaconda.com/pkgs/pro/noarch
                          https://conda.anaconda.org/leylabmpi/linux-64
                          https://conda.anaconda.org/leylabmpi/noarch
                          https://conda.anaconda.org/r/linux-64
                          https://conda.anaconda.org/r/noarch
                          https://conda.anaconda.org/qiime2/linux-64
                          https://conda.anaconda.org/qiime2/noarch
          package cache : /ebio/abt3_projects/software/dev/miniconda3_dev/pkgs
                          /ebio/abt3/nyoungblut/.conda/pkgs
       envs directories : /ebio/abt3_projects/software/dev/miniconda3_dev/envs
                          /ebio/abt3/nyoungblut/.conda/envs
               platform : linux-64
             user-agent : conda/4.5.11 requests/2.18.4 CPython/3.6.5 Linux/4.9.127 ubuntu/18.04 glibc/2.27
                UID:GID : 6354:350
             netrc file : None
           offline mode : False
nick-youngblut commented 5 years ago

Even when removing the temp directory, I still get a very similar error, just further down in the process:

mseqs map --threads 24 -s 2 -c 0 /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_db /tmp/global2/nyoungblut/LLMGA_27929269397/linclust/genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_v_genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/tmp/
Program call:
map --threads 24 -s 2 -c 0 /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_db /tmp/global2/nyoungblut/LLMGA_27929269397/linclust/genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_v_genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/tmp/

MMseqs Version:                                                             7.4e23d
Sub Matrix                                                                  blosum62.out
Sensitivity                                                                 2
K-mer size                                                                  0
K-score                                                                     2147483647
Alphabet size                                                               21
Max. sequence length                                                        65535
Max. results per query                                                      300
Offset result                                                               0
Split DB                                                                    0
Split mode                                                                  2
Split Memory Limit                                                          0
Coverage threshold                                                          0
Coverage Mode                                                               2
Compositional bias                                                          0
Diagonal Scoring                                                            1
Exact k-mer matching                                                        0
Mask Residues                                                               0
Minimum Diagonal score                                                      15
Include identical Seq. Id.                                                  false
Spaced Kmer                                                                 1
Preload mode                                                                0
Pseudo count a                                                              1
Pseudo count b                                                              1.5
Spaced k-mer pattern
Local temporary path
Threads                                                                     24
Verbosity                                                                   3
Rescore mode                                                                2
Remove hits by seq.id. and coverage                                         false
E-value threshold                                                           0.001
Seq. Id Threshold                                                           0.9
Seq. Id. Mode                                                               0
Sort results                                                                1
In substitution scoring mode, performs global alignment along the diagonal  false
Min codons in orf                                                           1
Max codons in length                                                        2147483647
Max orf gaps                                                                2147483647
Contig start mode                                                           2
Contig end mode                                                             2
Orf start mode                                                              0
Forward Frames                                                              1,2,3
Reverse Frames                                                              1,2,3
Translation Table                                                           1
Use all table starts                                                        false
Offset of numeric ids                                                       0
Add Orf Stop                                                                false
Start sensitivity                                                           4
Search steps                                                                1
Sets the MPI runner
Remove Temporary Files                                                      false

Program call:
search /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_db /tmp/global2/nyoungblut/LLMGA_27929269397/linclust/genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_v_genes_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/tmp//3961034198315058036 --sub-mat blosum62.out -s 2 -k 0 --k-score 2147483647 --alph-size 21 --max-seq-len 65535 --max-seqs 300 --offset-result 0 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 2 --comp-bias-corr 0 --diag-score 1 --exact-kmer-matching 0 --mask 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 24 -v 3 --rescore-mode 2 --filter-hits 0 -e 0.001 --min-seq-id 0.9 --seq-id-mode 0 --sort-results 1 --global-alignment 0 --min-length 1 --max-length 2147483647 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 0 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --use-all-table-starts 0 --id-offset 0 --add-orf-stop 0 --start-sens 4 --sens-steps 1 --remove-tmp-files 0 --alignment-mode 4

MMseqs Version:                                                             7.4e23d
Sub Matrix                                                                  blosum62.out
Add backtrace                                                               false
Alignment mode                                                              4
E-value threshold                                                           0.001
Seq. Id Threshold                                                           0.9
Seq. Id. Mode                                                               0
Alternative alignments                                                      0
Coverage threshold                                                          0
Coverage Mode                                                               2
Max. sequence length                                                        65535
Max. results per query                                                      300
Compositional bias                                                          0
Realign hit                                                                 false
Max Reject                                                                  2147483647
Max Accept                                                                  2147483647
Include identical Seq. Id.                                                  false
Preload mode                                                                0
Pseudo count a                                                              1
Pseudo count b                                                              1.5
Score bias                                                                  0
Gap open cost                                                               11
Gap extension cost                                                          1
Threads                                                                     24
Verbosity                                                                   3
Sensitivity                                                                 2
K-mer size                                                                  0
K-score                                                                     2147483647
Alphabet size                                                               21
Offset result                                                               0
Split DB                                                                    0
Split mode                                                                  2
Split Memory Limit                                                          0
Diagonal Scoring                                                            1
Exact k-mer matching                                                        0
Mask Residues                                                               0
Minimum Diagonal score                                                      15
Spaced Kmer                                                                 1
Spaced k-mer pattern
Local temporary path
Rescore mode                                                                2
Remove hits by seq.id. and coverage                                         false
Sort results                                                                1
In substitution scoring mode, performs global alignment along the diagonal  false
Mask profile                                                                1
Profile e-value threshold                                                   0.1
Use global sequence weighting                                               false
Filter MSA                                                                  1
Maximum sequence identity threshold                                         0.9
Minimum seq. id.                                                            0
Minimum score per column                                                    -20
Minimum coverage                                                            0
Select n most diverse seqs                                                  1000
Omit Consensus                                                              false
Min codons in orf                                                           1
Max codons in length                                                        2147483647
Max orf gaps                                                                2147483647
Contig start mode                                                           2
Contig end mode                                                             2
Orf start mode                                                              0
Forward Frames                                                              1,2,3
Reverse Frames                                                              1,2,3
Translation Table                                                           1
Use all table starts                                                        false
Offset of numeric ids                                                       0
Add Orf Stop                                                                false
Number search iterations                                                    1
Start sensitivity                                                           4
Search steps                                                                1
Run a seq-profile search in slice mode                                      false
Strand selection                                                            1
Disk space limit                                                            0
Sets the MPI runner
Remove Temporary Files                                                      false

Program call:
extractorfs /tmp/global2/nyoungblut/LLMGA_27929269397/map/R1-R2_db /tmp/global2/nyoungblut/LLMGA_27929269397/map/tmp//3961034198315058036/2495337974437510140/q_orfs --min-length 1 --max-length 2147483647 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 0 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --use-all-table-starts 0 --id-offset 0 --threads 24 -v 3

MMseqs Version:         7.4e23d
Min codons in orf       1
Max codons in length    2147483647
Max orf gaps            2147483647
Contig start mode       2
Contig end mode         2
Orf start mode          0
Forward Frames          1,2,3
Reverse Frames          1,2,3
Translation Table       1
Use all table starts    false
Offset of numeric ids   0
Threads                 24
Verbosity               3

................... 1 Mio. sequences processed
........... 2 Mio. sequences processed
........................................    3 Mio. sequences processed
.....   5 Mio. sequences processed
... 4 Mio. sequences processed
........................    9 Mio. sequences processed
.....   10 Mio. sequences processed
............    8 Mio. sequences processed
..  6 Mio. sequences processed
    11 Mio. sequences processed
.............   7 Mio. sequences processed
......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Time for merging files: 0h 43m 20s 281ms
Time for merging files: 0h 40m 5s 189ms
Time for processing: 1h 28m 10s 704ms
Program call:
translatenucs /tmp/global2/nyoungblut/LLMGA_27929269397/map/tmp//3961034198315058036/2495337974437510140/q_orfs /tmp/global2/nyoungblut/LLMGA_27929269397/map/tmp//3961034198315058036/2495337974437510140/q_orfs_aa --translation-table 1 --add-orf-stop 0 -v 3 --threads 24

MMseqs Version:     7.4e23d
Translation Table   1
Add Orf Stop        false
Verbosity           3
Threads             24

................................................................................................... 1 Mio. sequences processed
................................................................................................... 2 Mio. sequences processed
................................................................................................... 3 Mio. sequences processed
................................................................................................... 4 Mio. sequences processed
................................................................................................... 5 Mio. sequences processed
................................................................................................... 6 Mio. sequences processed
................................................................................................... 7 Mio. sequences processed
................................................................................................... 8 Mio. sequences processed
................................................................................................... 9 Mio. sequences processed
................................................................................................... 10 Mio. sequences processed
................................................................................................... 11 Mio. sequences processed
................................................................................................... 12 Mio. sequences processed
................................................................................................... 13 Mio. sequences processed
................................................................................................... 14 Mio. sequences processed
................................................................................................... 15 Mio. sequences processed
................................................................................................... 16 Mio. sequences processed
................................................................................................... 17 Mio. sequences processed
................................................................................................... 18 Mio. sequences processed
................................................................................................... 19 Mio. sequences processed
................................................................................................... 20 Mio. sequences processed
................................................................................................... 21 Mio. sequences processed
................................................................................................... 22 Mio. sequences processed
................................................................................................... 23 Mio. sequences processed
................................................................................................... 24 Mio. sequences processed
................................................................................................... 25 Mio. sequences processed
................................................................................................... 26 Mio. sequences processed
................................................................................................... 27 Mio. sequences processed
................................................................................................... 28 Mio. sequences processed
................................................................................................... 29 Mio. sequences processed
................................................................................................... 30 Mio. sequences processed
................................................................................................... 31 Mio. sequences processed
................................................................................................... 32 Mio. sequences processed
................................................................................................... 33 Mio. sequences processed
................................................................................................... 34 Mio. sequences processed
................................................................................................... 35 Mio. sequences processed
................................................................................................... 36 Mio. sequences processed
................................................................................................... 37 Mio. sequences processed
................................................................................................... 38 Mio. sequences processed
................................................................................................... 39 Mio. sequences processed
................................................................................................... 40 Mio. sequences processed
................................................................................................... 41 Mio. sequences processed
................................................................................................... 42 Mio. sequences processed
................................................................................................... 43 Mio. sequences processed
................................................................................................... 44 Mio. sequences processed
................................................................................................... 45 Mio. sequences processed
................................................................................................... 46 Mio. sequences processed
................................................................................................... 47 Mio. sequences processed
................................................................................................... 48 Mio. sequences processed
................................................................................................... 49 Mio. sequences processed
................................................................................................... 50 Mio. sequences processed
................................................................................................... 51 Mio. sequences processed
................................................................................................... 52 Mio. sequences processed
................................................................................................... 53 Mio. sequences processed
................................................................................................... 54 Mio. sequences processed
................................................................................................... 55 Mio. sequences processed
................................................................................................... 56 Mio. sequences processed
................................................................................................... 57 Mio. sequences processed
................................................................................................... 58 Mio. sequences processed
................................................................................................... 59 Mio. sequences processed
................................................................................................... 60 Mio. sequences processed
................................................................................................... 61 Mio. sequences processed
................................................................................................... 62 Mio. sequences processed
................................................................................................... 63 Mio. sequences processed
................................................................................................... 64 Mio. sequences processed
................................................................................................... 65 Mio. sequences processed
................................................................................................... 66 Mio. sequences processed
................................................................................................... 67 Mio. sequences processed
................................................................................................... 68 Mio. sequences processed
................................................................................................... 69 Mio. sequences processed
................................................................................................... 70 Mio. sequences processed
................................................................................................... 71 Mio. sequences processed
................................................................................................... 72 Mio. sequences processed
................................................................................................... 73 Mio. sequences processed
................................................................................................... 74 Mio. sequences processed
................................................................................................... 75 Mio. sequences processed
................................................................................................... 76 Mio. sequences processed
................................................................................................... 77 Mio. sequences processed
................................................................................................... 78 Mio. sequences processed
................................................................................................... 79 Mio. sequences processed
................................................................................................... 80 Mio. sequences processed
................................................................................................... 81 Mio. sequences processed
................................................................................................... 82 Mio. sequences processed
................................................................................................... 83 Mio. sequences processed
................................................................................................... 84 Mio. sequences processed
................................................................................................... 85 Mio. sequences processed
................................................................................................... 86 Mio. sequences processed
................................................................................................... 87 Mio. sequences processed
................................................................................................... 88 Mio. sequences processed
................................................................................................... 89 Mio. sequences processed
................................................................................................... 90 Mio. sequences processed
................................................................................................... 91 Mio. sequences processed
........................................................................Time for merging files: 0h 43m 22s 698ms
Time for processing: 0h 44m 6s 639ms
Please provide <queryDB> <targetDB> <outDB> <tmp>
Error: Search step died
milot-mirdita commented 5 years ago

Okay the map workflows seems to be broken currently. Please run the search workflow with the parameter --alignment-mode 4 until we fix the issue. This is equivalent to the map workflow, it is just supposed to be a shortcut.

milot-mirdita commented 5 years ago

The map workflow should work in the latest git version again.