soedinglab / MMseqs2-App

MMseqs2 app to run on your workstation or servers
https://search.foldseek.com
GNU General Public License v3.0
63 stars 18 forks source link

Error when searching nucleotide DB #3

Closed simone-pignotti closed 5 years ago

simone-pignotti commented 5 years ago

I am using the docker version of the webserver, and I have successfully built and searched the example uniclust DB and another custom protein DB. I have also built a DNA DB, but searching it produces an error. Both the building and the searching logs are attached, and the issue seems to be a wrong path:

mmseqs-web-worker_1     | splitsequence /opt/mmseqs-web/databases/mydb_nt.idx /opt/mmseqs-web/jobs/UE3fNh-OopaZXL7T5LF-ZwnmWa4PbEOPQiOu8w/tmp/13815044513268057608/search_tmp/9906939544243059259/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --threads 24 --compressed 0 -v 3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               43a10ce5f7e83f421af19b7a5cc1792e9c3d3bbd
mmseqs-web-worker_1     | Max. sequence length          10000
mmseqs-web-worker_1     | Overlap between sequences     0
mmseqs-web-worker_1     | Threads                       24
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Could not open data file /opt/mmseqs-web/databases/mydb_nt.idx_h!
mmseqs-web-worker_1     | Error: Split sequence died
mmseqs-web-worker_1     | Error: Search died
mmseqs-web-worker_1     | 2019/03/05 13:54:03 Execution Error: exit status 1

I believe the splitsequence command is supposed to take mydb_nt and not mydb_nt.idx as argument, therefore it should be an easy fix.

I hope this is useful, let me know if you can't reproduce the issue. Simone createdb.log search.log

milot-mirdita commented 5 years ago

Thanks for trying the MMseqs2 webserver!

I've triggered rebuilding the docker images based on the latest MMseqs2 image. The issue should be fixed there. I can do more detailed testing in a couple days though.

simone-pignotti commented 5 years ago

Thank you for the quick fix, I'll test it on my instance and will let you know if the problem persists.

simone-pignotti commented 5 years ago

I tried the latest mmseqs-app-* docker images (running docker-compose pull and docker-compose up --build), and I got a new error (see full log for details).

mmseqs-web-worker_1     | Unrecognized parameter --index-type
mmseqs-web-worker_1     | Did you mean "--search-type"?
mmseqs-web-worker_1     | 2019/03/05 15:21:17 Execution Error: exit status 1

It is triggered by the createindex command, but I can't find the specific command in the log. dockerup.log

milot-mirdita commented 5 years ago

I will look at the problem in detail in the evening. I know whats wrong, but its a little bit of a larger fix.

milot-mirdita commented 5 years ago

Update: The docker images should work again. I split search field in the .params file into two separate fields: index und search one for the indexing step and one for the actual search.

There is one more minor issue that I'll resolve soon, that MMseqs2 is currently not cleaning up correctly after itself and will leave an unnecessary number of files for each job. I'll fix that as soon as I can.

simone-pignotti commented 5 years ago

I keep getting errors with the updated image:

mmseqs-web-worker_1     | 2019/03/11 12:41:25 MMseqs2 worker
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | createdb /opt/mmseqs-web/databases/test.fasta /opt/mmseqs-web/databases/test
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Split Seq. by len             true
mmseqs-web-worker_1     | Database type                 0
mmseqs-web-worker_1     | Do not shuffle input database true
mmseqs-web-worker_1     | Offset of numeric ids         0
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Assuming DNA database, forcing parameter --dont-split-seq-by-len true
mmseqs-web-worker_1     | .......Time for merging files: 0h 0m 0s 139ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 613ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 10ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 3s 889ms
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | createindex /opt/mmseqs-web/databases/test /tmp --remove-tmp-files true --check-compatible true
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Seed Substitution Matrix      PAM30.out
mmseqs-web-worker_1     | K-mer size                    0
mmseqs-web-worker_1     | Alphabet size                 21
mmseqs-web-worker_1     | Compositional bias            1
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Mask Residues                 1
mmseqs-web-worker_1     | Spaced Kmer                   1
mmseqs-web-worker_1     | Spaced k-mer pattern
mmseqs-web-worker_1     | Sensitivity                   7.5
mmseqs-web-worker_1     | K-score                       0
mmseqs-web-worker_1     | Check Compatible              true
mmseqs-web-worker_1     | Search type                   0
mmseqs-web-worker_1     | Split DB                      0
mmseqs-web-worker_1     | Split Memory Limit            0
mmseqs-web-worker_1     | Threads                       24
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     | Min codons in orf             30
mmseqs-web-worker_1     | Max codons in length          98202
mmseqs-web-worker_1     | Max orf gaps                  2147483647
mmseqs-web-worker_1     | Contig start mode             2
mmseqs-web-worker_1     | Contig end mode               2
mmseqs-web-worker_1     | Orf start mode                1
mmseqs-web-worker_1     | Forward Frames                1,2,3
mmseqs-web-worker_1     | Reverse Frames                1,2,3
mmseqs-web-worker_1     | Translation Table             1
mmseqs-web-worker_1     | Use all table starts          false
mmseqs-web-worker_1     | Offset of numeric ids         0
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Add Orf Stop                  false
mmseqs-web-worker_1     | Overlap between sequences     0
mmseqs-web-worker_1     | Strand selection              1
mmseqs-web-worker_1     | Remove Temporary Files        true
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Database /opt/mmseqs-web/databases/test is a nucleotide database.
mmseqs-web-worker_1     | Please provide the parameter --search-type 2 (translated) or 3 (nucleotide)
mmseqs-web-worker_1     | 2019/03/11 12:41:29 Execution Error: exit status 1

Adding "index":"--search-type 3" to the params dictionary results in:

mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | createdb /opt/mmseqs-web/databases/test.fasta /opt/mmseqs-web/databases/test
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Split Seq. by len             true
mmseqs-web-worker_1     | Database type                 0
mmseqs-web-worker_1     | Do not shuffle input database true
mmseqs-web-worker_1     | Offset of numeric ids         0
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Assuming DNA database, forcing parameter --dont-split-seq-by-len true
mmseqs-web-worker_1     | .......Time for merging files: 0h 0m 0s 204ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 563ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 10ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 3s 906ms
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | createindex /opt/mmseqs-web/databases/test /tmp --remove-tmp-files true --check-compatible true --search-type 3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Seed Substitution Matrix      PAM30.out
mmseqs-web-worker_1     | K-mer size                    0
mmseqs-web-worker_1     | Alphabet size                 21
mmseqs-web-worker_1     | Compositional bias            1
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Mask Residues                 1
mmseqs-web-worker_1     | Spaced Kmer                   1
mmseqs-web-worker_1     | Spaced k-mer pattern
mmseqs-web-worker_1     | Sensitivity                   7.5
mmseqs-web-worker_1     | K-score                       0
mmseqs-web-worker_1     | Check Compatible              true
mmseqs-web-worker_1     | Search type                   3
mmseqs-web-worker_1     | Split DB                      0
mmseqs-web-worker_1     | Split Memory Limit            0
mmseqs-web-worker_1     | Threads                       24
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     | Min codons in orf             30
mmseqs-web-worker_1     | Max codons in length          98202
mmseqs-web-worker_1     | Max orf gaps                  2147483647
mmseqs-web-worker_1     | Contig start mode             2
mmseqs-web-worker_1     | Contig end mode               2
mmseqs-web-worker_1     | Orf start mode                1
mmseqs-web-worker_1     | Forward Frames                1,2,3
mmseqs-web-worker_1     | Reverse Frames                1,2,3
mmseqs-web-worker_1     | Translation Table             1
mmseqs-web-worker_1     | Use all table starts          false
mmseqs-web-worker_1     | Offset of numeric ids         0
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Add Orf Stop                  false
mmseqs-web-worker_1     | Overlap between sequences     0
mmseqs-web-worker_1     | Strand selection              1
mmseqs-web-worker_1     | Remove Temporary Files        true
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | splitsequence /opt/mmseqs-web/databases/test /tmp/9366001766242878652/nucl_split_seq --max-seq-len 65535 --sequence-overlap 0 -
-threads 24 --compressed 0 -v 3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Overlap between sequences     0
mmseqs-web-worker_1     | Threads                       24
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | .......Time for merging files: 0h 0m 0s 21ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 316ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 648ms
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | extractframes /tmp/9366001766242878652/nucl_split_seq /tmp/9366001766242878652/nucl_split_seq_rev --forward-frames 1 --threads
24 --compressed 0 -v 3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:       efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Forward Frames        1
mmseqs-web-worker_1     | Reverse Frames        1,2,3
mmseqs-web-worker_1     | Threads               24
mmseqs-web-worker_1     | Compressed            0
mmseqs-web-worker_1     | Verbosity             3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | .......Time for merging files: 0h 0m 0s 22ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 315ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 686ms
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | indexdb /tmp/9366001766242878652/nucl_split_seq_rev.dbtype /opt/mmseqs-web/databases/test --seed-sub-mat PAM30.out -k 0 --alph-
size 21 --comp-bias-corr 1 --max-seq-len 65535 --mask 1 --spaced-kmer-mode 1 -s 7.5 --k-score 0 --check-compatible 1 --search-type 3 --split 0 --split-me
mory-limit 0 --threads 24 -v 3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Seed Substitution Matrix      PAM30.out
mmseqs-web-worker_1     | K-mer size                    0
mmseqs-web-worker_1     | Alphabet size                 21
mmseqs-web-worker_1     | Compositional bias            1
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Mask Residues                 1
mmseqs-web-worker_1     | Spaced Kmer                   1
mmseqs-web-worker_1     | Spaced k-mer pattern
mmseqs-web-worker_1     | Sensitivity                   7.5
mmseqs-web-worker_1     | K-score                       0
mmseqs-web-worker_1     | Check Compatible              true
mmseqs-web-worker_1     | Search type                   3
mmseqs-web-worker_1     | Split DB                      0
mmseqs-web-worker_1     | Split Memory Limit            0
mmseqs-web-worker_1     | Threads                       24
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Could not open index file /tmp/9366001766242878652/nucl_split_seq_rev.dbtype.index!
mmseqs-web-worker_1     | Error: indexdb died
mmseqs-web-worker_1     | 2019/03/11 12:46:21 Execution Error: exit status 1

UPDATE: using "index":"--search-type 2" in params works fine (forget the previous edit, my bad)

simone-pignotti commented 5 years ago

I tried to use the latest soedinglab/mmseqs2 docker image and run into the same issue when running: docker run -v `pwd`:`pwd` -w `pwd` soedinglab/mmseqs2 mmseqs createindex databases/test /tmp --remove-tmp-files true --check-compatible true --search-type 3 Therefore this has nothing to do with the web server. Should I open a new issue on the main repo?

milot-mirdita commented 5 years ago

Yes, please to that. Sorry for the slow support. We have a deadline approaching soon :/

simone-pignotti commented 5 years ago

No problem, it's done. Thank you again! UPDATE: see soedinglab/MMseqs2#175

simone-pignotti commented 5 years ago

Hi, The issue with mmseqs has been solved and I have tested the nucleotide indexing and searching successfully. Could you please re-trigger the backend build? Version 9 should be good enough but I believe there was another commit related to bug fixes in the nucleotide search after that, so maybe latest is better ( or previous commit including at least 0e3fbac011481fd6291b92a0b48adce98fc0f007 , d9a44e89721ae3348246997bf1f009671ff58a83 , d1e25ae7f4e921c041022d93b69f16fc324339f9 ).

Thank you!

simone-pignotti commented 5 years ago

I tested the mmseqs executables from the latest mmseqs2 docker image with the web app (by copying them manually into the web app docker image using docker cp) and I can confirm that it works now!

EDIT: it works only after setting the index and search params to --search-type 3 [--strand 2] (strand only needed if you want to search both strand, that I believe should be made the default with nucleotide search). I think only setting index would work, as search is normally set to auto and should detect that from the index, but I haven't tested it. This is an example of #2 and it would be nice to have a full example of working nucleotide index in addition to the params specification.

milot-mirdita commented 5 years ago

Sorry for not rebuilding the containers sooner, once Docker Hub finishes you should have MMseqs2 r9 in them.

Also a caveat: The nucleotide-nucleotide search is still in development by @martin-steinegger There is no manuscript or exhaustive benchmarks yet.

simone-pignotti commented 5 years ago

No problem, that's great. I will keep an eye on the nt-nt search development :) I guess this issue can be closed, if you agree

milot-mirdita commented 5 years ago

Okay :) we’d be happy for feedback if you run into any issues with the nt-nt search.

simone-pignotti commented 5 years ago

Sure! I haven't yet, but I must say I only use it occasionally. Thanks again.

josemduarte commented 4 years ago

I've found this thread very useful in debugging problems with nucleotide searches. Thanks @simone-pignotti and @milot-mirdita

Now I have managed to have DNA and RNA databases up and running in mmseqs-app. However the search doesn't work and it looks purely like an API problem. The server logs seem to have quite happily run mmseqs.

This is my DNA and RNA databases config:

{
"databases": [
{
"name": "PDB DNA sequence (seqres)",
"version": "2020-02-18",
"path": "pdb_dna_sequence",
"default": false,
"order": 1,
"index": "--search-type 3",
"search": "--max-seqs 2000 --search-type 3"
},
{
"name": "PDB RNA sequence (seqres)",
"version": "2020-02-18",
"path": "pdb_rna_sequence",
"default": false,
"order": 2,
"index": "--search-type 3",
"search": "--max-seqs 2000 --search-type 3"
}
]
}

And what I get from API ticket endpoint (e.g. api/ticket/LwDKtIlhXr4oSa7w-zTpgwxHMXikC-FInHXmvg) is:

{
"id": "LwDKtIlhXr4oSa7w-zTpgwxHMXikC-FInHXmvg",
"status": "COMPLETE"
}

So all looks ok there. However from result endpoint (e.g. api/result/LwDKtIlhXr4oSa7w-zTpgwxHMXikC-FInHXmvg/0) I get this (with a 400 http return code):

record on line 3: wrong number of fields

Any ideas what can be wrong?

milot-mirdita commented 4 years ago

Please execute the following command from within the docker-compose directory and upload the output.

head jobs/LwDKtIlhXr4oSa7w-zTpgwxHMXikC-FInHXmvg/alis_*

Can you please answer in a new issue so we don't spam simone's email with notifications?