soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.4k stars 195 forks source link

Indexing of RNA sequences suddenly treats them as protein sequences #611

Open JonStargaryen opened 2 years ago

JonStargaryen commented 2 years ago

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

  1. Install v5-2e4efbf of MMseqs2-App
    cd /opt/mmseqs
    git clone https://github.com/soedinglab/MMseqs2-App.git
    cd MMseqs2-App
    git checkout v5-2e4efbf
  2. Create /opt/mmseqs/MMseqs2-App/docker-compose/databases/pdb_rna_sequence.fasta with some test RNA sequences:
    >4QEI_2
    GCGCCGCUGGUGUAGUGGUAUCAUGCAAGAUUCCCAUUCUUGCGACCCGGGUUCGAUUCCCGGGCGGCG
    >6Q8V_1
    GGCGAAGAACCGGGGAGCC
    >5DO4_3
    GGGAAXAAAGXUGAAGUXXUUAXXX
    >1VQL_1
    UUGGCUACUAUGCCAGCUGGUGGAUUGCUCGGCUCAGGCGCUGAUGAAGGACGUGCCAAGCUGCGAUAAGCCAUGGGGAGCCGCACGGAGGCGAAGAACCAUGGAUUUCCGAAUGAGAAUCUCUCUAACAAUUGCUUCGCGCAAUGAGGAACCCCGAGAACUGAAACAUCUCAGUAUCGGGAGGAACAGAAAACGCAAUGUGAUGUCGUUAGUAACCGCGAGUGAACGCGAUACAGCCCAAACCGAAGCCCUCACGGGCAAUGUGGUGUCAGGGCUACCUCUCAUCAGCCGACCGUCUCGACGAAGUCUCUUGGAACAGAGCGUGAUACAGGGUGACAACCCCGUACUCGAGACCAGUACGACGUGCGGUAGUGCCAGAGUAGCGGGGGUUGGAUAUCCCUCGCGAAUAACGCAGGCAUCGACUGCGAAGGCUAAACACAACCUGAGACCGAUAGUGAACAAGUAGUGUGAACGAACGCUGCAAAGUACCCUCAGAAGGGAGGCGAAAUAGAGCAUGAAAUCAGUUGGCGAUCGAGCGACAGGGCAUACAAGGUCCCUCGACGAAUGACCGACGCGCGAGCGUCCAGUAAGACUCACGGGAAGCCGAUGUUCUGUCGUACGUUUUGAAAAACGAGCCAGGGAGUGUGUCUGCAUGGCAAGUCUAACCGGAGUAUCCGGGGAGGCACAGGGAAACCGACAUGGCCGCAGGGCUUUGCCCGAGGGCCGCCGUCUUCAAGGGCGGGGAGCCAUGUGGACACGACCCGAAUCCGGACGAUCUACGCAUGGACAAGAUGAAGCGUGCCGAAAGGCACGUGGAAGUCUGUUAGAGUUGGUGUCCUACAAUACCCUCUCGUGAUCUAUGUGUAGGGGUGAAAGGCCCAUCGAGUCCGGCAACAGCUGGUUCCAAUCGAAACAUGUCGAAGCAUGACCUCCGCCGAGGUAGUCUGUGAGGUAGAGCGACCGAUUGGUGUGUCCGCCUCCGAGAGGAGUCGGCACACCUGUCAAACUCCAAACUUACAGACGCCGUUUGACGCGGGGAUUCCGGUGCGCGGGGUAAGCCUGUGUACCAGGAGGGGAACAACCCAGAGAUAGGUUAAGGUCCCCAAGUGUGGAUUAAGUGUAAUCCUCUGAAGGUGGUCUCGAGCCCUAGACAGCCGGGAGGUGAGCUUAGAAGCAGCUACCCUCUAAGAAAAGCGUAACAGCUUACCGGCCGAGGUUUGAGGCGCCCAAAAUGAUCGGGACUCAAAUCCACCACCGAGACCUGUCCGUACCACUCAUACUGGUAAUCGAGUAGAUUGGCGCUCUAAUUGGAUGGAAGUAGGGGUGAAAACUCCUAUGGACCGAUUAGUGACGAAAAUCCUGGCCAUAGUAGCAGCGAUAGUCGGGUGAGAACCCCGACGGCCUAAUGGAUAAGGGUUCCUCAGCACUGCUGAUCAGCUGAGGGUUAGCCGGUCCUAAGUCAUACCGCAACUCGACUAUGACGAAAUGGGAAACGGGUUAAUAUUCCCGUGCCACUAUGCAGUGAAAGUUGACGCCCUGGGGUCGAUCACGCUGGGCAUUCGCCCAGUCGAACCGUCCAACUCCGUGGAAGCCGUAAUGGCAGGAAGCGGACGAACGGCGGCAUAGGGAAACGUGAUUCAACCUGGGGCCCAUGAAAAGACGAGCAUAGUGUCCGUACCGAGAACCGACACAGGUGUCCAUGGCGGCGAAAGCCAAGGCCUGUCGGGAGCAACCAACGUUAGGGAAUUCGGCAAGUUAGUCCCGUACCUUCGGAAGAAGGGAUGCCUGCUCCGGAACGGAGCAGGUCGCAGUGACUCGGAAGCUCGGACUGUCUAGUAACAACAUAGGUGACCGCAAAUCCGCAAGGACUCGUACGGUCACUGAAUCCUGCCCAGUGCAGGUAUCUGAACACCUCGUACAAGAGGACGAAGGACCUGUCAACGGCGGGGGUAACUAUGACCCUCUUAAGGUAGCGUAGUACCUUGCCGCAUCAGUAGCGGCUUGCAUGAAUGGAUUAACCAGAGCUUCACUGUCCCAACGUUGGGCCCGGUGAACUGUACAUUCCAGUGCGGAGUCUGGAGACACCCAGGGGGAAGCGAAGACCCUAUGGAGCUUUACUGCAGGCUGUCGCUGAGACGUGGUCGCCGAUGUGCAGCAUAGGUAGGAGACACUACACAGGUACCCGCGCUAGCGGGCCACCGAGUCAACAGUGAAAUACUACCCGUCGGUGACUGCGACUCUCACUCCGGGAGGAGGACACCGAUAGCCGGGCAGUUUGACUGGGGCGGUACGCGCUCGAAAAGAUAUCGAGCGCGCCCUAUGGCUAUCUCAGCCGGGACAGAGACCCGGCGAAGAGUGCAAGAGCAAAAGAUAGCUUGACAGUGUUCUUCCCAACGAGGAACGCUGACGCGAAAGCGUGGUCUAGCGAACCAAUUAGCCUGCUUGAUGCGGGCAAUUGAUGACAGAAAAGCUACCCUAGGGAUAACAGAGUCGUCACUCGCAAGAGCACAUAUCGACCGAGUGGCUUGCUACCUCGAUGUCGGUUCCCUCCAUCCUGCCCGUGCAGAAGCGGGCAAGGGUGAGGUUGUUCGCCUAUUAAAGGAGGUCGUGAGCUGGGUUUAGACCGUCGUGAGACAGGUCGGCUGCUAUCUACUGGGUGUGUAAUGGUGUCUGACAAGAACGACCGUAUAGUACGAGAGGAACUACGGUUGGUGGCCACUGGUGUACCGGUUGUUCGAGAGAGCACGUGCCGGGUAGCCACGCCACACGGGGUAAGAGCUGAACGCAUCUAAGCUCGAAACCCACUUGGAAAAGAGACACCGCCGAGGUCCCGCGUACAAGACGCGGUCGAUAGACUCGGGGUGUGCGCGUCGAGGUAACGAGACGUUAAGCCCACGAGCACUAACAGACCAAAGCCAUCAU
    >1VQL_2
    UUAGGCGGCCACAGCGGUGGGGUUGCCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCACCAGCGUUCCGGGGAGUACUGGAGUGCGCGAGCCUCUGGGAAACCCGGUUCGCCGCCACC
    >3ADD_2
    GGCCGCCGCCACCGGGGUGGUCCCCGGGCCGGACUUCAGAUCCGGCGCGCCCCGAGUGGGGCGCGGGGUUCAAUUCCCCGCGGCGGCCGCCA
    >5U33_2
    GGUCUAGAGGACAGAAUUUUUCAACGGGUGUGCCAAUGGCCACUUUCCAGGUGGCAAAGCCCGUUGAGCUUCUCAAAUCUGAGAAGUGGCACCAGAACCGGAGGACAAAGUC
    >6LQU_1
    GUCGACGUACUUCAUAGGAUCAUUUCUAUAGGAAUCGUCACUCUUUGACUCUUCAAAAGAGCCACUGAAUCCAACUUGGUUGAUGAGUCCCAUAACCUUUGUACCCCAGAGUGAGAAACCGAAAUUGAAUCUAAAUUAGCUUGGUCCGCAAUCCUUAGCGGUUCGGCCAUCUAUAAUUUUGAAUAAAAAUUUUGCUUUGCCGUUGCAUUUGUAGUUUUUUCCUUUGGAAGUAAUUACAAUAUUUUAUGGCGCGAUGAUCUUGACCCAUCCUAUGUACUUCUUUUUUGAAGGGAUAGGGCUCUAUGGGUGGGUACAAAUGGCAGUCUGACAAGU
    >6LQU_2
    AUGCGAAAGCAGUUGAAGACAAGUUCGAAAAGAGUUUGGAAACGAAUUCGAGUAGGCUUGUCGUUCGUUAUGUUUUUGUAAAUGGCCUCGUCAAACGGUGGAGAGAGUCGCUAGGUGAUCGUCAGAUCUGCCUAGUCUCUAUACAGCGUGUUUAAUUGACAUGGGUUGAUGCGUAUUGAGAGAUACAAUUUGGGAAGAAAUUCCCAGAGUGUGUUUCUUUUGCGUUUAACCUGAACAGUCUCAUCGUGGGCAUCUUGCGAUUCCAUUGGUGAGCAGCGAAGGAUUUGGUGGAUUACUAGCUAAUAGCAAUCUAUUUCAAAGAAUUCAAACUUGGGGGAAUGCCUUGUUGAAUAGCCGGUCGCAAGACUGUGAUUCUUCAAGUGUAACCUCCUCUCAAAUCAGCGAUAUCAAACGUACCAUUCCGUGAAACACCGGGGUAUCUGUUUGGUGGAACCUGAUUAGAGGAAACUCAAAGAGUGCUAUGGUAUGGUGACGGAGUGCGCUGGUCAAGAGUGUAAAAGCUUUUUGAACAGAGAGCAUUUCCGGCAGCAGAGAGACCUGAAAAAGCAAUUUUUCUGGAAUUUCAGCUGUUUCCAAACUCAAUAAGUAUCUUCUAGCAAGAGGGAAUAGGUGGGAAAAAAAAAAAGAGAUUUCGGUUUCUUUCUUUUUUACUGCUUGUUGCUUCUUCUUUUAAGAUAGU
    >6LQU_3
    AAGAUAGUUAUCUGGUUGAUCCUGCCAGUAGUCAUAUGCUUGUCUCAAAGAUUAAGCCAUGCAUGUCUAAGUAUAAGCAAUUUAUACAGUGAAACUGCGAAUGGCUCAUUAAAUCAGUUAUCGUUUAUUUGAUAGUUCCUUUACUACAUGGUAUAACUGUGGUAAUUCUAGAGCUAAUACAUGCUUAAAAUCUCGACCCUUUGGAAGAGAUGUAUUUAUUAGAUAAAAAAUCAAUGUCUUCGGACUCUUUGAUGAUUCAUAAUAACUUUUCGAAUCGCAUGGCCUUGUGCUGGCGAUGGUUCAUUCAAAUUUCUGCCCUAUCAACUUUCGAUGGUAGGAUAGUGGCCUACCAUGGUUUCAACGGGUAACGGGGAAUAAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCAAGGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUAAUUCAGGGAGGUAGUGACAAUAAAUAACGAUACAGGGCCCAUUCGGGUCUUGUAAUUGGAAUGAGUACAAUGUAAAUACCUUAACGAGGAACAAUUGGAGGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGAACUUUGGGCCCGGUUGGCCGGUCCGAUUUUUUCGUGUACUGGAUUUCCAACGGGGCCUUUCCUUCUGGCUAACCUUGAGUCCUUGUGGCUCUUGGCGAACCAGGACUUUUACUUUGAAAAAAUUAGAGUGUUCAAAGCAGGCGUAUUGCUCGAAUAUAUUAGCAUGGAAUAAUAGAAUAGGACGUUUGGUUCUAUUUUGUUGGUUUCUAGGACCAUCGUAAUGAUUAAUAGGGACGGUCGGGGGCAUCAGUAUUCAAUUGUCAGAGGUGAAAUUCUUGGAUUUAUUGAAGACUAACUACUGCGAAAGCAUUUGCCAAGGACGUUUUCAUUAAUCAAGAACGAAAGUUAGGGGAUCGAAGAUGAUCAGAUACCGUCGUAGUCUUAACCAUAAACUAUGCCGACUAGGGAUCGGGUGGUGUUUUUUUAAUGACCCACUCGGCACCUUACGAGAAAUCAAAGUCUUUGGGUUCUGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACCACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUCACCAGGUCCAGACACAAUAAGGAUUGACAGAUUGAGAGCUCUUUCUUGAUUUUGUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGCUUAAUUGCGAUAACGAACGAGACCUUAACCUACUAAAUAGUGGUGCUAGCAUUUGCUGGUUAUCCACUUCUUAGAGGGACUAUCGGUUUCAAGCCGAUGGAAGUUUGAGGCAAUAACAGGUCUGUGAUGCCCUUAGACGUUCUGGGCCGCACGCGCGCUACACUGACGGAGCCAGCGAGUCUAACCUUGGCCGAGAGGUCUUGGUAAUCUUGUGAAACUCCGUCGUGCUGGGGAUAGAGCAUUGUAAUUAUUGCUCUUCAACGAGGAAUUCCUAGUAAGCGCAAGUCAUCAGCUUGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUAGUACCGAUUGAAUGGCUUAGUGAGGCCUCAGGAUCUGCUUAGAGAAGGGGGCAACUCCAUCUCAGAGCGGAGAAUUUGGACAAACUUGGUCAUUUAGAGGAACUAAAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUA
    >2G1G_1
    ACGGGCUCAUAACCCGU
    >6VZJ_8
    AUGUUUUUUUUUUUUUUUGAUUUGGUGAGAGG
  3. Create /opt/mmseqs/MMseqs2-App/docker-compose/databases/pdb_rna_sequence.params:
    {
    "display": {
    "name": "PDB RNA sequence (seqres)",
    "version": "2022-09-22",
    "default": false,
    "order": 2,
    "index": "--search-type 3",
    "search": "--max-seqs 2000 --search-type 3 --strand 1"
    }
    }
  4. Call docker-compose up --no-color

MMseqs Output (for bugs)

/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Creating network "docker-compose_default" with the default driver
Creating volume "docker-compose_redis-volume" with default driver
Pulling mmseqs-web-redis (redis:alpine)...
alpine: Pulling from library/redis
Digest: sha256:dc1b954f5a1db78e31b8870966294d2f93fa8a7fba5c1337a1ce4ec55f311bc3
Status: Downloaded newer image for redis:alpine
Pulling mmseqs-web-api (soedinglab/mmseqs2-app-backend:version-5)...
version-5: Pulling from soedinglab/mmseqs2-app-backend
Digest: sha256:cb3937c294b3e666af4f4a424af0b7c4adc4f5469e2f309651b9a852731d786d
Status: Downloaded newer image for soedinglab/mmseqs2-app-backend:version-5
Pulling mmseqs-web-webserver (soedinglab/mmseqs2-app-frontend:version-5)...
version-5: Pulling from soedinglab/mmseqs2-app-frontend
Digest: sha256:30690f07999e0d1d7eccfe63c40df2b11d246590f094014e16987ed5d74d06d9
Status: Downloaded newer image for soedinglab/mmseqs2-app-frontend:version-5
Creating docker-compose_mmseqs-web-redis_1 ... 
Creating docker-compose_mmseqs-web-redis_1 ... done
Creating docker-compose_mmseqs-web-api_1   ... 
Creating docker-compose_mmseqs-web-api_1   ... done
Creating docker-compose_mmseqs-web-webserver_1 ... 
Creating docker-compose_mmseqs-web-worker_1    ... 
Creating docker-compose_mmseqs-web-worker_1    ... done
Creating docker-compose_mmseqs-web-webserver_1 ... done
Attaching to docker-compose_mmseqs-web-redis_1, docker-compose_mmseqs-web-api_1, docker-compose_mmseqs-web-worker_1, docker-compose_mmseqs-web-webserver_1
mmseqs-web-api_1        | 2022/09/22 16:09:10 MMseqs2 Webserver
mmseqs-web-redis_1      | 1:C 22 Sep 2022 16:09:10.002 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
mmseqs-web-redis_1      | 1:C 22 Sep 2022 16:09:10.002 # Redis version=7.0.4, bits=64, commit=00000000, modified=0, pid=1, just started
mmseqs-web-redis_1      | 1:C 22 Sep 2022 16:09:10.002 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
mmseqs-web-redis_1      | 1:M 22 Sep 2022 16:09:10.002 * monotonic clock: POSIX clock_gettime
mmseqs-web-redis_1      | 1:M 22 Sep 2022 16:09:10.003 * Running mode=standalone, port=6379.
mmseqs-web-redis_1      | 1:M 22 Sep 2022 16:09:10.003 # Server initialized
mmseqs-web-redis_1      | 1:M 22 Sep 2022 16:09:10.003 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
mmseqs-web-redis_1      | 1:M 22 Sep 2022 16:09:10.003 * Ready to accept connections
mmseqs-web-worker_1     | 2022/09/22 16:09:10 MMseqs2 worker
mmseqs-web-worker_1     | createdb /opt/mmseqs-web/databases/pdb_rna_sequence.fasta /opt/mmseqs-web/databases/pdb_rna_sequence 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | MMseqs Version:           e1a1c1226ef22ac3d0da8e8f71adb8fd2388a249
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Converting sequences
mmseqs-web-worker_1     | [=
mmseqs-web-worker_1     | Time for merging to pdb_rna_sequence_h: 0h 0m 0s 12ms
mmseqs-web-worker_1     | Time for merging to pdb_rna_sequence: 0h 0m 0s 9ms
mmseqs-web-worker_1     | Database type: Aminoacid
mmseqs-web-worker_1     | Time for merging to pdb_rna_sequence.lookup: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 60ms
mmseqs-web-worker_1     | indexdb /opt/mmseqs-web/databases/pdb_rna_sequence /opt/mmseqs-web/databases/pdb_rna_sequence --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -k 0 --alph-size 21 --comp-bias-corr 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score 0 --check-compatible 1 --search-type 3 --split 0 --split-memory-limit 0 -v 3 --threads 2 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Estimated memory consumption: 1023M
mmseqs-web-worker_1     | Write VERSION (0)
mmseqs-web-worker_1     | Write META (1)
mmseqs-web-worker_1     | Write SCOREMATRIX3MER (4)
mmseqs-web-worker_1     | Write SCOREMATRIX2MER (3)
mmseqs-web-worker_1     | Write SCOREMATRIXNAME (2)
mmseqs-web-worker_1     | Write SPACEDPATTERN (23)
mmseqs-web-worker_1     | Write DBR1INDEX (5)
mmseqs-web-worker_1     | Write DBR1DATA (6)
mmseqs-web-worker_1     | Write HDR1INDEX (18)
mmseqs-web-worker_1     | Write HDR1DATA (19)
mmseqs-web-worker_1     | Write GENERATOR (22)
mmseqs-web-worker_1     | Index table: counting k-mers
mmseqs-web-worker_1     | [=================================================================] 10.79K 0s 918ms
mmseqs-web-worker_1     | Index table: Masked residues: 4422171
mmseqs-web-worker_1     | Index table: fill
mmseqs-web-worker_1     | [=================================================================] 10.79K 0s 61ms
mmseqs-web-worker_1     | Index statistics
mmseqs-web-worker_1     | Entries:          40961
mmseqs-web-worker_1     | DB size:          488 MB
mmseqs-web-worker_1     | Avg k-mer size:   0.000640
mmseqs-web-worker_1     | Top 10 k-mers
mmseqs-web-worker_1     |     AAGAGA    872
mmseqs-web-worker_1     |     CGGACA    532
mmseqs-web-worker_1     |     AAGACG    524
mmseqs-web-worker_1     |     AGAGCC    460
mmseqs-web-worker_1     |     AAGCGC    453
mmseqs-web-worker_1     |     AAAGGG    440
mmseqs-web-worker_1     |     CGAGGG    403
mmseqs-web-worker_1     |     AGCCAG    386
mmseqs-web-worker_1     |     GCGCAG    385
mmseqs-web-worker_1     |     AACAGG    323
mmseqs-web-worker_1     | Write ENTRIES (9)
mmseqs-web-worker_1     | Write ENTRIESOFFSETS (10)
mmseqs-web-worker_1     | Write SEQINDEXDATASIZE (15)
mmseqs-web-worker_1     | Write SEQINDEXSEQOFFSET (16)
mmseqs-web-worker_1     | Write SEQINDEXDATA (14)
mmseqs-web-worker_1     | Write ENTRIESNUM (12)
mmseqs-web-worker_1     | Write SEQCOUNT (13)
mmseqs-web-worker_1     | Time for merging to pdb_rna_sequence.idx: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 3s 688ms
mmseqs-web-worker_1     | touchdb /opt/mmseqs-web/databases/pdb_rna_sequence 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | MMseqs Version:   e1a1c1226ef22ac3d0da8e8f71adb8fd2388a249
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 47ms
mmseqs-web-worker_1     | 2022/09/22 16:09:14 Process finished gracefully without error

For comparison, logs looked like this when the RNA indexing was working:

mmseqs-web-worker_1     | indexdb /tmp/16248551593497607411/nucl_split_seq /opt/mmseqs-web/databases/pdb_rna_sequence --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -k 15 --alph-size 21 --comp-bias-corr 1 --max-seq-len 10000 --max-seqs 300 --mask 1 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score 0 --check-compatible 1 --search-type 3 --split 0 --split-memory-limit 0 -v 3 --threads 2 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Estimated memory consumption: 8G
mmseqs-web-worker_1     | Write VERSION (0)
mmseqs-web-worker_1     | Write META (1)
mmseqs-web-worker_1     | Write SCOREMATRIX3MER (4)
mmseqs-web-worker_1     | Write SCOREMATRIX2MER (3)
mmseqs-web-worker_1     | Write SCOREMATRIXNAME (2)
mmseqs-web-worker_1     | Write SPACEDPATTERN (23)
mmseqs-web-worker_1     | Write DBR1INDEX (5)
mmseqs-web-worker_1     | Write DBR1DATA (6)
mmseqs-web-worker_1     | Write DBR2INDEX (7)
mmseqs-web-worker_1     | Write DBR2DATA (8)
mmseqs-web-worker_1     | Write HDR1INDEX (18)
mmseqs-web-worker_1     | Write HDR1DATA (19)
mmseqs-web-worker_1     | Write HDR2INDEX (20)
mmseqs-web-worker_1     | Write HDR2DATA (21)
mmseqs-web-worker_1     | Write GENERATOR (22)
mmseqs-web-worker_1     | Index table: counting k-mers
mmseqs-web-worker_1     | [=================================================================] 10.75K 1s 426ms
mmseqs-web-worker_1     | Index table: Masked residues: 29698
mmseqs-web-worker_1     | Index table: fill
mmseqs-web-worker_1     | [=================================================================] 10.75K 0s 783ms
mmseqs-web-worker_1     | Index statistics
mmseqs-web-worker_1     | Entries:          6377423
mmseqs-web-worker_1     | DB size:          8228 MB
mmseqs-web-worker_1     | Avg k-mer size:   0.005939
mmseqs-web-worker_1     | Top 10 k-mers
mmseqs-web-worker_1     |     GAGTGGTTAGGCGGA   1170
mmseqs-web-worker_1     |     TGAAACGGCCCCCAC   886
mmseqs-web-worker_1     |     CCCCCAAGGGACAGT   883
mmseqs-web-worker_1     |     GGGTAGTCCGCAGGC   882
mmseqs-web-worker_1     |     TTGGTAAGCCACGGC   882
mmseqs-web-worker_1     |     GGGGCAGCGTGATTT   882
mmseqs-web-worker_1     |     TGGTAAGTCCAGACG   882
mmseqs-web-worker_1     |     AACGATTAATCGGAG   865
mmseqs-web-worker_1     |     ATTAGGGGCCAAACG   855
mmseqs-web-worker_1     |     TTCAGCAAGCGACTT   854
mmseqs-web-worker_1     | Write ENTRIES (9)
mmseqs-web-worker_1     | Write ENTRIESOFFSETS (10)
mmseqs-web-worker_1     | Write SEQINDEXDATASIZE (15)
mmseqs-web-worker_1     | Write SEQINDEXSEQOFFSET (16)
mmseqs-web-worker_1     | Write SEQINDEXDATA (14)
mmseqs-web-worker_1     | Write ENTRIESNUM (12)
mmseqs-web-worker_1     | Write SEQCOUNT (13)
mmseqs-web-worker_1     | Time for merging to pdb_rna_sequence.idx: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 29s 698ms
mmseqs-web-worker_1     | Remove temporary files
mmseqs-web-worker_1     | rmdb /tmp/16248551593497607411/nucl_split_seq 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | touchdb /opt/mmseqs-web/databases/pdb_rna_sequence 

Please note the difference wrt the first argument, -k 15, and --max-seq-len 10000.


When I run any RNA query, e.g.:

>unnamed
UUUUUUUUUUUUUUUUUUUUUUUUU

The result is this:

mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:04:37 +0000] "POST /ticket HTTP/1.0" 200 67
mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:04:37 +0000] "POST /tickets HTTP/1.0" 200 69
mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:04:37 +0000] "GET /ticket/2_V_MA3apsJrv4zAPrfZqMF1rM88nxIsLoPD9g HTTP/1.0" 200 67
mmseqs-web-worker_1     | Tmp /opt/mmseqs-web/jobs/2_V_MA3apsJrv4zAPrfZqMF1rM88nxIsLoPD9g/tmp folder does not exist or is not a directory.
mmseqs-web-worker_1     | createdb /opt/mmseqs-web/jobs/2_V_MA3apsJrv4zAPrfZqMF1rM88nxIsLoPD9g/job.fasta /opt/mmseqs-web/jobs/2_V_MA3apsJrv4zAPrfZqMF1rM88nxIsLoPD9g/tmp/5203697948948418300/query --dbtype 0 --shuffle 0 --createdb-mode 0 --id-offset 0 --compressed 0 -v 3 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Converting sequences
mmseqs-web-worker_1     | [
mmseqs-web-worker_1     | Time for merging to query_h: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for merging to query: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Database type: Nucleotide
mmseqs-web-worker_1     | Time for merging to query.lookup: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Tmp /opt/mmseqs-web/jobs/2_V_MA3apsJrv4zAPrfZqMF1rM88nxIsLoPD9g/tmp/5203697948948418300/search_tmp folder does not exist or is not a directory.
mmseqs-web-worker_1     | extractorfs /opt/mmseqs-web/jobs/2_V_MA3apsJrv4zAPrfZqMF1rM88nxIsLoPD9g/tmp/5203697948948418300/query /opt/mmseqs-web/jobs/2_V_MA3apsJrv4zAPrfZqMF1rM88nxIsLoPD9g/tmp/5203697948948418300/search_tmp/10674843733463968173/q_orfs_aa --min-length 30 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 1 --use-all-table-starts 0 --id-offset 0 --create-lookup 0 --threads 2 --compressed 0 -v 3 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | [=================================================================] 1 0s 5ms
mmseqs-web-worker_1     | Time for merging to q_orfs_aa_h: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for merging to q_orfs_aa: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 6ms
mmseqs-web-worker_1     | prefilter /opt/mmseqs-web/jobs/2_V_MA3apsJrv4zAPrfZqMF1rM88nxIsLoPD9g/tmp/5203697948948418300/search_tmp/10674843733463968173/q_orfs_aa /opt/mmseqs-web/databases/pdb_rna_sequence.idx /opt/mmseqs-web/jobs/2_V_MA3apsJrv4zAPrfZqMF1rM88nxIsLoPD9g/tmp/5203697948948418300/search_tmp/10674843733463968173/search/pref_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -k 0 --k-score 2147483647 --alph-size 21 --max-seq-len 65535 --max-seqs 2000 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 2 --pca 1 --pcb 1.5 --threads 2 --compressed 0 -v 3 -s 5.7 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Index version: 16
mmseqs-web-worker_1     | Generated by:  e1a1c1226ef22ac3d0da8e8f71adb8fd2388a249
mmseqs-web-worker_1     | ScoreMatrix:  VTML80.out
mmseqs-web-worker_1     | Query database size: 0 type: Aminoacid
mmseqs-web-worker_1     | split was set to 1 but the db to split has only 0 sequences. Please run with default paramerters
mmseqs-web-worker_1     | Error: Prefilter died
mmseqs-web-worker_1     | Error: Search step died
mmseqs-web-worker_1     | Error: Search died
mmseqs-web-worker_1     | 2022/09/22 23:04:37 Execution Error: exit status 1
mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:04:38 +0000] "GET /ticket/2_V_MA3apsJrv4zAPrfZqMF1rM88nxIsLoPD9g HTTP/1.0" 200 65

This is the healthy state:

mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:07:17 +0000] "POST /ticket HTTP/1.0" 200 67
mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:07:17 +0000] "POST /tickets HTTP/1.0" 200 137
mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:07:17 +0000] "GET /ticket/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A HTTP/1.0" 200 67
mmseqs-web-worker_1     | Tmp /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp folder does not exist or is not a directory.
mmseqs-web-worker_1     | createdb /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/job.fasta /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/query --dbtype 0 --shuffle 0 --createdb-mode 0 --id-offset 0 --compressed 0 -v 3 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Converting sequences
mmseqs-web-worker_1     | [
mmseqs-web-worker_1     | Time for merging to query_h: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for merging to query: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Database type: Nucleotide
mmseqs-web-worker_1     | Time for merging to query.lookup: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Tmp /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp folder does not exist or is not a directory.
mmseqs-web-worker_1     | splitsequence /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/query /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/query_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --create-lookup 0 --threads 2 --compressed 0 -v 3 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | prefilter /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/query_seqs_split /opt/mmseqs-web/databases/pdb_rna_sequence.idx /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/search/pref_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -k 15 --k-score 2147483647 --alph-size 21 --max-seq-len 10000 --max-seqs 2000 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --diag-score 1 --exact-kmer-matching 1 --mask 1 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 2 --pca 1 --pcb 1.5 --threads 2 --compressed 0 -v 3 -s 5.7 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Index version: 16
mmseqs-web-worker_1     | Generated by:  e1a1c1226ef22ac3d0da8e8f71adb8fd2388a249
mmseqs-web-worker_1     | ScoreMatrix:  nucleotide.out
mmseqs-web-worker_1     | Query database size: 1 type: Nucleotide
mmseqs-web-worker_1     | Estimated memory consumption: 8G
mmseqs-web-worker_1     | Target database size: 10754 type: Nucleotide
mmseqs-web-worker_1     | Process prefiltering step 1 of 1
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | k-mer similarity threshold: 0
mmseqs-web-worker_1     | Starting prefiltering scores calculation (step 1 of 1)
mmseqs-web-worker_1     | Query db start 1 to 1
mmseqs-web-worker_1     | Target db start 1 to 10754
mmseqs-web-worker_1     | [=================================================================] 1 0s 4ms
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | 0.120000 k-mers per position
mmseqs-web-worker_1     | 0 DB matches per sequence
mmseqs-web-worker_1     | 0 overflows
mmseqs-web-worker_1     | 0 queries produce too much hits (truncated result)
mmseqs-web-worker_1     | 0 sequences passed prefiltering per query sequence
mmseqs-web-worker_1     | 0 median result list length
mmseqs-web-worker_1     | 1 sequences with 0 size result lists
mmseqs-web-worker_1     | Time for merging to pref_0: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 5ms
mmseqs-web-worker_1     | align /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/query_seqs_split /opt/mmseqs-web/databases/pdb_rna_sequence.idx /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/search/pref_0 /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/aln --sub-mat nucl:nucleotide.out,aa:blosum62.out -a 1 --alignment-mode 3 --wrapped-scoring 0 -e 0.001 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 10000 --comp-bias-corr 1 --realign 0 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 2 --pca 1 --pcb 1.5 --score-bias 0 --gap-open 5 --gap-extend 2 --zdrop 40 --threads 2 --compressed 0 -v 3 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Index version: 16
mmseqs-web-worker_1     | Generated by:  e1a1c1226ef22ac3d0da8e8f71adb8fd2388a249
mmseqs-web-worker_1     | ScoreMatrix:  nucleotide.out
mmseqs-web-worker_1     | Compute score, coverage and sequence identity
mmseqs-web-worker_1     | Query database size: 1 type: Nucleotide
mmseqs-web-worker_1     | Target database size: 10754 type: Nucleotide
mmseqs-web-worker_1     | Calculation of alignments
mmseqs-web-worker_1     | [=================================================================] 1 0s 2ms
mmseqs-web-worker_1     | Time for merging to aln: 0h 0m 0s 0ms
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | 0 alignments calculated.
mmseqs-web-worker_1     | 0 sequence pairs passed the thresholds (-nan of overall calculated).
mmseqs-web-worker_1     | 0.000000 hits per query sequence.
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 37ms
mmseqs-web-worker_1     | Remove temporary files
mmseqs-web-worker_1     | rmdb /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/search/pref_0 -v 3 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | rmdb /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/search/aln_0 -v 3 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | rmdb /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/search/input_0 -v 3 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | rmdb /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/search/aln_merge -v 3 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | offsetalignment /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/query /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/query_seqs_split /opt/mmseqs-web/databases/pdb_rna_sequence.idx /opt/mmseqs-web/databases/pdb_rna_sequence.idx /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/aln /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/result --chain-alignments 0 --merge-query 1 --search-type 3 --threads 2 --compressed 0 --db-load-mode 2 -v 3 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Index version: 16
mmseqs-web-worker_1     | Generated by:  e1a1c1226ef22ac3d0da8e8f71adb8fd2388a249
mmseqs-web-worker_1     | ScoreMatrix:  nucleotide.out
mmseqs-web-worker_1     | Index version: 16
mmseqs-web-worker_1     | Generated by:  e1a1c1226ef22ac3d0da8e8f71adb8fd2388a249
mmseqs-web-worker_1     | ScoreMatrix:  nucleotide.out
mmseqs-web-worker_1     | Index version: 16
mmseqs-web-worker_1     | Generated by:  e1a1c1226ef22ac3d0da8e8f71adb8fd2388a249
mmseqs-web-worker_1     | ScoreMatrix:  nucleotide.out
mmseqs-web-worker_1     | Computing ORF lookup
mmseqs-web-worker_1     | Computing contig offsets
mmseqs-web-worker_1     | Computing contig lookup
mmseqs-web-worker_1     | Time for contig lookup: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Writing results to: /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/result
mmseqs-web-worker_1     | [=================================================================] 1 0s 0ms
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for merging to result: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 1ms
mmseqs-web-worker_1     | Remove temporary files
mmseqs-web-worker_1     | rmdb /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/q_orfs 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | rmdb /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/q_orfs_aa 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | rmdb /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/t_orfs 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | rmdb /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/search_tmp/7042327585447941378/t_orfs_aa 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | convertalis /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/query /opt/mmseqs-web/databases/pdb_rna_sequence.idx /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/result /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/alis_pdb_rna_sequence --sub-mat nucl:nucleotide.out,aa:blosum62.out --format-mode 0 --format-output query,target,pident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen,tlen,qaln,taln --translation-table 1 --gap-open 11 --gap-extend 1 --db-output 1 --db-load-mode 2 --search-type 3 --threads 2 --compressed 0 -v 3 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Index version: 16
mmseqs-web-worker_1     | Generated by:  e1a1c1226ef22ac3d0da8e8f71adb8fd2388a249
mmseqs-web-worker_1     | ScoreMatrix:  nucleotide.out
mmseqs-web-worker_1     | Index version: 16
mmseqs-web-worker_1     | Generated by:  e1a1c1226ef22ac3d0da8e8f71adb8fd2388a249
mmseqs-web-worker_1     | ScoreMatrix:  nucleotide.out
mmseqs-web-worker_1     | Index version: 16
mmseqs-web-worker_1     | Generated by:  e1a1c1226ef22ac3d0da8e8f71adb8fd2388a249
mmseqs-web-worker_1     | ScoreMatrix:  nucleotide.out
mmseqs-web-worker_1     | [=================================================================] 1 0s 0ms
mmseqs-web-worker_1     | Time for merging to alis_pdb_rna_sequence: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 33ms
mmseqs-web-worker_1     | Removing temporary files
mmseqs-web-worker_1     | rmdb /opt/mmseqs-web/jobs/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/tmp/4977427893312161071/result 
mmseqs-web-worker_1     | 
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | 2022/09/22 23:07:17 Process finished gracefully without error
mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:07:18 +0000] "GET /ticket/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A HTTP/1.0" 200 68
mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:07:18 +0000] "GET /ticket/type/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A HTTP/1.0" 200 18
mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:07:18 +0000] "POST /tickets HTTP/1.0" 200 138
mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:07:18 +0000] "GET /result/queries/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/7/0 HTTP/1.0" 200 63
mmseqs-web-api_1        | 172.17.0.4 - - [22/Sep/2022:23:07:18 +0000] "GET /result/qRNpiMtH372WZ-C_jMFQJKscz01WKcG1y9ej0A/0 HTTP/1.0" 200 126

Context

On a weekly basis, some new sequences are added to the FASTA file. This week, RNA searching stopped working and seems to report errors for all queries. We didn't make any version changes and have been using mmseqs via MMseqs2-App v5 for a long time. Last week's data now shows the same behavior (but was fine a week ago).

Indexing for protein and DNA sequences still works as expected. DNA k-mers have size 15. I find it peculiar that the top k-mers for RNA sequences don't contain any U (or T), as it was the case previously.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

Thank you in advance!

JonStargaryen commented 2 years ago

It's a new week and we serve new (RNA) data. Now it's working again for our production data.

However, the example above is still broken. Is there some auto-detect step that determine the database type from the provided sequences? The only thing that changed is the sequence file.

milot-mirdita commented 2 years ago

I think this sequence broke the nucl-prot detection heuristic:

>5DO4_3
GGGAAXAAAGXUGAAGUXXUUAXXX

MMseqs2 assumes that the unknown residue for nucleotides is N and not X. The heuristic for differing between the two is as follows:

if all of these are nucleotide sequences, declare the database to be a nucleotide database.

You can disable the heuristic and fix the database to be a nucleotide db by passing --dbtype 2 to easy-search (in this case the search entry in the .params file.

JonStargaryen commented 2 years ago

Thanks for the pointer to use --dbtype 2. I've tried it (and a combination of other options & software versions) but no luck. The behavior doesn't change, the database is still indexed as amino acid sequences and search is still failing.

For now, I can try to sort the sequence file in a fashion that prevents problematic sequences from appearing at the top.

josemduarte commented 2 years ago

I think the issue here is that the detection of NA/prot goes wrong at indexing time. So what is also needed is the equivalent to --dbtype 2 but for indexing.