soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.36k stars 190 forks source link

Excessively long easy-taxonomy against NR #515

Open mgabriell1 opened 2 years ago

mgabriell1 commented 2 years ago

Hi, I am trying to get the taxonomy of several contigs present in a multi-fasta file, but I'm having some issues with the easy-taxonomy command, as it is has not completed the assignment of about 804K contigs on 16 threads in 24h using as reference database NR. Due to the limits of the machine that I'm using (I can use a partition with external connection using only a single core and a rather short time limit) set up the database using a mix of the databases command and the other commands shown in the user guide. Among the different steps I changed the number of threads used, as, for example, it seemed that createdb worked only using the same number of threads with which databases was initially run. Is this something to be expected or have I done something wrong during the database setup?

Thanks in advance for your help and, also, for making this tool!

These are the commands that I've used:

mmseqs databases NR refDB/NR tmp --threads 1 -v 3  --force-reuse 1
mmseqs createdb tmp/11117391383852458210/nr.gz refDB/NR --compressed 0 -v 3
mmseqs createtaxdb refDB/NR tmp
mmseqs createindex refDB/NR tmp --split-memory-limit 100G
mmseqs easy-taxonomy contigs.fasta refDB/NR alnRes tmp --split-memory-limit 100G --threads 16

This is the output of createdb:

createdb tmp/11117391383852458210/nr.gz refDB/NR --compressed 0 -v 3 

MMseqs Version:         13.45111
Database type           0
Shuffle input database  true
Createdb mode           0
Write lookup file       1
Offset of numeric ids   0
Compressed              0
Verbosity               3

Converting sequences
[===================================================================================================    1 Mio. sequences processed
=================================================================================================== 2 Mio. sequences processed
=================================================================================================== 3 Mio. sequences processed
=================================================================================================== 4 Mio. sequences processed
=================================================================================================== 5 Mio. sequences processed
=================================================================================================== 6 Mio. sequences processed
=================================================================================================== 7 Mio. sequences processed
=================================================================================================== 8 Mio. sequences processed
=================================================================================================== 9 Mio. sequences processed
=================================================================================================== 10 Mio. sequences processed
=================================================================================================== 11 Mio. sequences processed
=================================================================================================== 12 Mio. sequences processed
=================================================================================================== 13 Mio. sequences processed
=================================================================================================== 14 Mio. sequences processed
=================================================================================================== 15 Mio. sequences processed
=================================================================================================== 16 Mio. sequences processed
=================================================================================================== 17 Mio. sequences processed
=================================================================================================== 18 Mio. sequences processed
=================================================================================================== 19 Mio. sequences processed
=================================================================================================== 20 Mio. sequences processed
=================================================================================================== 21 Mio. sequences processed
=================================================================================================== 22 Mio. sequences processed
=================================================================================================== 23 Mio. sequences processed
=================================================================================================== 24 Mio. sequences processed
=================================================================================================== 25 Mio. sequences processed
=================================================================================================== 26 Mio. sequences processed
=================================================================================================== 27 Mio. sequences processed
=================================================================================================== 28 Mio. sequences processed
=================================================================================================== 29 Mio. sequences processed
=================================================================================================== 30 Mio. sequences processed
=================================================================================================== 31 Mio. sequences processed
=================================================================================================== 32 Mio. sequences processed
=================================================================================================== 33 Mio. sequences processed
=================================================================================================== 34 Mio. sequences processed
=================================================================================================== 35 Mio. sequences processed
=================================================================================================== 36 Mio. sequences processed
=================================================================================================== 37 Mio. sequences processed
=================================================================================================== 38 Mio. sequences processed
=================================================================================================== 39 Mio. sequences processed
=================================================================================================== 40 Mio. sequences processed
=================================================================================================== 41 Mio. sequences processed
=================================================================================================== 42 Mio. sequences processed
=================================================================================================== 43 Mio. sequences processed
=================================================================================================== 44 Mio. sequences processed
=================================================================================================== 45 Mio. sequences processed
=================================================================================================== 46 Mio. sequences processed
=================================================================================================== 47 Mio. sequences processed
=================================================================================================== 48 Mio. sequences processed
=================================================================================================== 49 Mio. sequences processed
=================================================================================================== 50 Mio. sequences processed
=================================================================================================== 51 Mio. sequences processed
=================================================================================================== 52 Mio. sequences processed
=================================================================================================== 53 Mio. sequences processed
=================================================================================================== 54 Mio. sequences processed
=================================================================================================== 55 Mio. sequences processed
=================================================================================================== 56 Mio. sequences processed
=================================================================================================== 57 Mio. sequences processed
=================================================================================================== 58 Mio. sequences processed
=================================================================================================== 59 Mio. sequences processed
=================================================================================================== 60 Mio. sequences processed
=================================================================================================== 61 Mio. sequences processed
=================================================================================================== 62 Mio. sequences processed
=================================================================================================== 63 Mio. sequences processed
=================================================================================================== 64 Mio. sequences processed
=================================================================================================== 65 Mio. sequences processed
=================================================================================================== 66 Mio. sequences processed
=================================================================================================== 67 Mio. sequences processed
=================================================================================================== 68 Mio. sequences processed
=================================================================================================== 69 Mio. sequences processed
=================================================================================================== 70 Mio. sequences processed
=================================================================================================== 71 Mio. sequences processed
=================================================================================================== 72 Mio. sequences processed
=================================================================================================== 73 Mio. sequences processed
=================================================================================================== 74 Mio. sequences processed
=================================================================================================== 75 Mio. sequences processed
=================================================================================================== 76 Mio. sequences processed
=================================================================================================== 77 Mio. sequences processed
=================================================================================================== 78 Mio. sequences processed
=================================================================================================== 79 Mio. sequences processed
=================================================================================================== 80 Mio. sequences processed
=================================================================================================== 81 Mio. sequences processed
=================================================================================================== 82 Mio. sequences processed
=================================================================================================== 83 Mio. sequences processed
=================================================================================================== 84 Mio. sequences processed
=================================================================================================== 85 Mio. sequences processed
=================================================================================================== 86 Mio. sequences processed
=================================================================================================== 87 Mio. sequences processed
=================================================================================================== 88 Mio. sequences processed
=================================================================================================== 89 Mio. sequences processed
=================================================================================================== 90 Mio. sequences processed
=================================================================================================== 91 Mio. sequences processed
=================================================================================================== 92 Mio. sequences processed
=================================================================================================== 93 Mio. sequences processed
=================================================================================================== 94 Mio. sequences processed
=================================================================================================== 95 Mio. sequences processed
=================================================================================================== 96 Mio. sequences processed
=================================================================================================== 97 Mio. sequences processed
=================================================================================================== 98 Mio. sequences processed
=================================================================================================== 99 Mio. sequences processed
=================================================================================================== 100 Mio. sequences processed
=================================================================================================== 101 Mio. sequences processed
=================================================================================================== 102 Mio. sequences processed
=================================================================================================== 103 Mio. sequences processed
=================================================================================================== 104 Mio. sequences processed
=================================================================================================== 105 Mio. sequences processed
=================================================================================================== 106 Mio. sequences processed
=================================================================================================== 107 Mio. sequences processed
=================================================================================================== 108 Mio. sequences processed
=================================================================================================== 109 Mio. sequences processed
=================================================================================================== 110 Mio. sequences processed
=================================================================================================== 111 Mio. sequences processed
=================================================================================================== 112 Mio. sequences processed
=================================================================================================== 113 Mio. sequences processed
=================================================================================================== 114 Mio. sequences processed
=================================================================================================== 115 Mio. sequences processed
=================================================================================================== 116 Mio. sequences processed
=================================================================================================== 117 Mio. sequences processed
=================================================================================================== 118 Mio. sequences processed
=================================================================================================== 119 Mio. sequences processed
=================================================================================================== 120 Mio. sequences processed
=================================================================================================== 121 Mio. sequences processed
=================================================================================================== 122 Mio. sequences processed
=================================================================================================== 123 Mio. sequences processed
=================================================================================================== 124 Mio. sequences processed
=================================================================================================== 125 Mio. sequences processed
=================================================================================================== 126 Mio. sequences processed
=================================================================================================== 127 Mio. sequences processed
=================================================================================================== 128 Mio. sequences processed
=================================================================================================== 129 Mio. sequences processed
=================================================================================================== 130 Mio. sequences processed
=================================================================================================== 131 Mio. sequences processed
=================================================================================================== 132 Mio. sequences processed
=================================================================================================== 133 Mio. sequences processed
=================================================================================================== 134 Mio. sequences processed
=================================================================================================== 135 Mio. sequences processed
=================================================================================================== 136 Mio. sequences processed
=================================================================================================== 137 Mio. sequences processed
=================================================================================================== 138 Mio. sequences processed
=================================================================================================== 139 Mio. sequences processed
=================================================================================================== 140 Mio. sequences processed
=================================================================================================== 141 Mio. sequences processed
=================================================================================================== 142 Mio. sequences processed
=================================================================================================== 143 Mio. sequences processed
=================================================================================================== 144 Mio. sequences processed
=================================================================================================== 145 Mio. sequences processed
=================================================================================================== 146 Mio. sequences processed
=================================================================================================== 147 Mio. sequences processed
=================================================================================================== 148 Mio. sequences processed
=================================================================================================== 149 Mio. sequences processed
=================================================================================================== 150 Mio. sequences processed
=================================================================================================== 151 Mio. sequences processed
=================================================================================================== 152 Mio. sequences processed
=================================================================================================== 153 Mio. sequences processed
=================================================================================================== 154 Mio. sequences processed
=================================================================================================== 155 Mio. sequences processed
=================================================================================================== 156 Mio. sequences processed
=================================================================================================== 157 Mio. sequences processed
=================================================================================================== 158 Mio. sequences processed
=================================================================================================== 159 Mio. sequences processed
=================================================================================================== 160 Mio. sequences processed
=================================================================================================== 161 Mio. sequences processed
=================================================================================================== 162 Mio. sequences processed
=================================================================================================== 163 Mio. sequences processed
=================================================================================================== 164 Mio. sequences processed
=================================================================================================== 165 Mio. sequences processed
=================================================================================================== 166 Mio. sequences processed
=================================================================================================== 167 Mio. sequences processed
=================================================================================================== 168 Mio. sequences processed
=================================================================================================== 169 Mio. sequences processed
=================================================================================================== 170 Mio. sequences processed
=================================================================================================== 171 Mio. sequences processed
=================================================================================================== 172 Mio. sequences processed
=================================================================================================== 173 Mio. sequences processed
=================================================================================================== 174 Mio. sequences processed
=================================================================================================== 175 Mio. sequences processed
=================================================================================================== 176 Mio. sequences processed
=================================================================================================== 177 Mio. sequences processed
=================================================================================================== 178 Mio. sequences processed
=================================================================================================== 179 Mio. sequences processed
=================================================================================================== 180 Mio. sequences processed
=================================================================================================== 181 Mio. sequences processed
=================================================================================================== 182 Mio. sequences processed
=================================================================================================== 183 Mio. sequences processed
=================================================================================================== 184 Mio. sequences processed
=================================================================================================== 185 Mio. sequences processed
=================================================================================================== 186 Mio. sequences processed
=================================================================================================== 187 Mio. sequences processed
=================================================================================================== 188 Mio. sequences processed
=================================================================================================== 189 Mio. sequences processed
=================================================================================================== 190 Mio. sequences processed
=================================================================================================== 191 Mio. sequences processed
=================================================================================================== 192 Mio. sequences processed
=================================================================================================== 193 Mio. sequences processed
=================================================================================================== 194 Mio. sequences processed
=================================================================================================== 195 Mio. sequences processed
=================================================================================================== 196 Mio. sequences processed
=================================================================================================== 197 Mio. sequences processed
=================================================================================================== 198 Mio. sequences processed
=================================================================================================== 199 Mio. sequences processed
=================================================================================================== 200 Mio. sequences processed
=================================================================================================== 201 Mio. sequences processed
=================================================================================================== 202 Mio. sequences processed
=================================================================================================== 203 Mio. sequences processed
=================================================================================================== 204 Mio. sequences processed
=================================================================================================== 205 Mio. sequences processed
=================================================================================================== 206 Mio. sequences processed
=================================================================================================== 207 Mio. sequences processed
=================================================================================================== 208 Mio. sequences processed
=================================================================================================== 209 Mio. sequences processed
=================================================================================================== 210 Mio. sequences processed
=================================================================================================== 211 Mio. sequences processed
=================================================================================================== 212 Mio. sequences processed
=================================================================================================== 213 Mio. sequences processed
=================================================================================================== 214 Mio. sequences processed
=================================================================================================== 215 Mio. sequences processed
=================================================================================================== 216 Mio. sequences processed
=================================================================================================== 217 Mio. sequences processed
=================================================================================================== 218 Mio. sequences processed
=================================================================================================== 219 Mio. sequences processed
=================================================================================================== 220 Mio. sequences processed
=================================================================================================== 221 Mio. sequences processed
=================================================================================================== 222 Mio. sequences processed
=================================================================================================== 223 Mio. sequences processed
=================================================================================================== 224 Mio. sequences processed
=================================================================================================== 225 Mio. sequences processed
=================================================================================================== 226 Mio. sequences processed
=================================================================================================== 227 Mio. sequences processed
=================================================================================================== 228 Mio. sequences processed
=================================================================================================== 229 Mio. sequences processed
=================================================================================================== 230 Mio. sequences processed
=================================================================================================== 231 Mio. sequences processed
=================================================================================================== 232 Mio. sequences processed
=================================================================================================== 233 Mio. sequences processed
=================================================================================================== 234 Mio. sequences processed
=================================================================================================== 235 Mio. sequences processed
=================================================================================================== 236 Mio. sequences processed
=================================================================================================== 237 Mio. sequences processed
=================================================================================================== 238 Mio. sequences processed
=================================================================================================== 239 Mio. sequences processed
=================================================================================================== 240 Mio. sequences processed
=================================================================================================== 241 Mio. sequences processed
=================================================================================================== 242 Mio. sequences processed
=================================================================================================== 243 Mio. sequences processed
=================================================================================================== 244 Mio. sequences processed
=================================================================================================== 245 Mio. sequences processed
=================================================================================================== 246 Mio. sequences processed
=================================================================================================== 247 Mio. sequences processed
=================================================================================================== 248 Mio. sequences processed
=================================================================================================== 249 Mio. sequences processed
=================================================================================================== 250 Mio. sequences processed
=================================================================================================== 251 Mio. sequences processed
=================================================================================================== 252 Mio. sequences processed
=================================================================================================== 253 Mio. sequences processed
=================================================================================================== 254 Mio. sequences processed
=================================================================================================== 255 Mio. sequences processed
=================================================================================================== 256 Mio. sequences processed
=================================================================================================== 257 Mio. sequences processed
=================================================================================================== 258 Mio. sequences processed
=================================================================================================== 259 Mio. sequences processed
=================================================================================================== 260 Mio. sequences processed
=================================================================================================== 261 Mio. sequences processed
=================================================================================================== 262 Mio. sequences processed
=================================================================================================== 263 Mio. sequences processed
=================================================================================================== 264 Mio. sequences processed
=================================================================================================== 265 Mio. sequences processed
=================================================================================================== 266 Mio. sequences processed
=================================================================================================== 267 Mio. sequences processed
=================================================================================================== 268 Mio. sequences processed
=================================================================================================== 269 Mio. sequences processed
=================================================================================================== 270 Mio. sequences processed
=================================================================================================== 271 Mio. sequences processed
=================================================================================================== 272 Mio. sequences processed
=================================================================================================== 273 Mio. sequences processed
=================================================================================================== 274 Mio. sequences processed
=================================================================================================== 275 Mio. sequences processed
=================================================================================================== 276 Mio. sequences processed
=================================================================================================== 277 Mio. sequences processed
=================================================================================================== 278 Mio. sequences processed
=================================================================================================== 279 Mio. sequences processed
=================================================================================================== 280 Mio. sequences processed
=================================================================================================== 281 Mio. sequences processed
=================================================================================================== 282 Mio. sequences processed
=================================================================================================== 283 Mio. sequences processed
=================================================================================================== 284 Mio. sequences processed
=================================================================================================== 285 Mio. sequences processed
=================================================================================================== 286 Mio. sequences processed
=================================================================================================== 287 Mio. sequences processed
=================================================================================================== 288 Mio. sequences processed
=================================================================================================== 289 Mio. sequences processed
=================================================================================================== 290 Mio. sequences processed
=================================================================================================== 291 Mio. sequences processed
=================================================================================================== 292 Mio. sequences processed
=================================================================================================== 293 Mio. sequences processed
=================================================================================================== 294 Mio. sequences processed
=================================================================================================== 295 Mio. sequences processed
=================================================================================================== 296 Mio. sequences processed
=================================================================================================== 297 Mio. sequences processed
=================================================================================================== 298 Mio. sequences processed
=================================================================================================== 299 Mio. sequences processed
=================================================================================================== 300 Mio. sequences processed
=================================================================================================== 301 Mio. sequences processed
=================================================================================================== 302 Mio. sequences processed
=================================================================================================== 303 Mio. sequences processed
=================================================================================================== 304 Mio. sequences processed
=================================================================================================== 305 Mio. sequences processed
=================================================================================================== 306 Mio. sequences processed
=================================================================================================== 307 Mio. sequences processed
=================================================================================================== 308 Mio. sequences processed
=================================================================================================== 309 Mio. sequences processed
=================================================================================================== 310 Mio. sequences processed
=================================================================================================== 311 Mio. sequences processed
=================================================================================================== 312 Mio. sequences processed
=================================================================================================== 313 Mio. sequences processed
=================================================================================================== 314 Mio. sequences processed
=================================================================================================== 315 Mio. sequences processed
=================================================================================================== 316 Mio. sequences processed
=================================================================================================== 317 Mio. sequences processed
=================================================================================================== 318 Mio. sequences processed
=================================================================================================== 319 Mio. sequences processed
=================================================================================================== 320 Mio. sequences processed
=================================================================================================== 321 Mio. sequences processed
=================================================================================================== 322 Mio. sequences processed
=================================================================================================== 323 Mio. sequences processed
=================================================================================================== 324 Mio. sequences processed
=================================================================================================== 325 Mio. sequences processed
=================================================================================================== 326 Mio. sequences processed
=================================================================================================== 327 Mio. sequences processed
=================================================================================================== 328 Mio. sequences processed
=================================================================================================== 329 Mio. sequences processed
=================================================================================================== 330 Mio. sequences processed
=================================================================================================== 331 Mio. sequences processed
=================================================================================================== 332 Mio. sequences processed
=================================================================================================== 333 Mio. sequences processed
=================================================================================================== 334 Mio. sequences processed
=================================================================================================== 335 Mio. sequences processed
=================================================================================================== 336 Mio. sequences processed
=================================================================================================== 337 Mio. sequences processed
=================================================================================================== 338 Mio. sequences processed
=================================================================================================== 339 Mio. sequences processed
=================================================================================================== 340 Mio. sequences processed
=================================================================================================== 341 Mio. sequences processed
=================================================================================================== 342 Mio. sequences processed
=================================================================================================== 343 Mio. sequences processed
=================================================================================================== 344 Mio. sequences processed
=================================================================================================== 345 Mio. sequences processed
=================================================================================================== 346 Mio. sequences processed
=================================================================================================== 347 Mio. sequences processed
=================================================================================================== 348 Mio. sequences processed
=================================================================================================== 349 Mio. sequences processed
=================================================================================================== 350 Mio. sequences processed
=================================================================================================== 351 Mio. sequences processed
=================================================================================================== 352 Mio. sequences processed
=================================================================================================== 353 Mio. sequences processed
=================================================================================================== 354 Mio. sequences processed
=================================================================================================== 355 Mio. sequences processed
=================================================================================================== 356 Mio. sequences processed
=================================================================================================== 357 Mio. sequences processed
=================================================================================================== 358 Mio. sequences processed
=================================================================================================== 359 Mio. sequences processed
=================================================================================================== 360 Mio. sequences processed
=================================================================================================== 361 Mio. sequences processed
=================================================================================================== 362 Mio. sequences processed
=================================================================================================== 363 Mio. sequences processed
=================================================================================================== 364 Mio. sequences processed
=================================================================================================== 365 Mio. sequences processed
=================================================================================================== 366 Mio. sequences processed
=================================================================================================== 367 Mio. sequences processed
=================================================================================================== 368 Mio. sequences processed
=================================================================================================== 369 Mio. sequences processed
=================================================================================================== 370 Mio. sequences processed
=================================================================================================== 371 Mio. sequences processed
=================================================================================================== 372 Mio. sequences processed
=================================================================================================== 373 Mio. sequences processed
=================================================================================================== 374 Mio. sequences processed
=================================================================================================== 375 Mio. sequences processed
=================================================================================================== 376 Mio. sequences processed
=================================================================================================== 377 Mio. sequences processed
=================================================================================================== 378 Mio. sequences processed
=================================================================================================== 379 Mio. sequences processed
=================================================================================================== 380 Mio. sequences processed
=================================================================================================== 381 Mio. sequences processed
=================================================================================================== 382 Mio. sequences processed
=================================================================================================== 383 Mio. sequences processed
=================================================================================================== 384 Mio. sequences processed
=================================================================================================== 385 Mio. sequences processed
=================================================================================================== 386 Mio. sequences processed
=================================================================================================== 387 Mio. sequences processed
=================================================================================================== 388 Mio. sequences processed
=================================================================================================== 389 Mio. sequences processed
=================================================================================================== 390 Mio. sequences processed
=================================================================================================== 391 Mio. sequences processed
=================================================================================================== 392 Mio. sequences processed
=================================================================================================== 393 Mio. sequences processed
=================================================================================================== 394 Mio. sequences processed
=================================================================================================== 395 Mio. sequences processed
=================================================================================================== 396 Mio. sequences processed
=================================================================================================== 397 Mio. sequences processed
=================================================================================================== 398 Mio. sequences processed
=================================================================================================== 399 Mio. sequences processed
=================================================================================================== 400 Mio. sequences processed
=================================================================================================== 401 Mio. sequences processed
=================================================================================================== 402 Mio. sequences processed
=================================================================================================== 403 Mio. sequences processed
=================================================================================================== 404 Mio. sequences processed
=================================================================================================== 405 Mio. sequences processed
=================================================================================================== 406 Mio. sequences processed
=================================================================================================== 407 Mio. sequences processed
=================================================================================================== 408 Mio. sequences processed
=================================================================================================== 409 Mio. sequences processed
=================================================================================================== 410 Mio. sequences processed
=================================================================================================== 411 Mio. sequences processed
=================================================================================================== 412 Mio. sequences processed
=================================================================================================== 413 Mio. sequences processed
=================================================================================================== 414 Mio. sequences processed
=================================================================================================== 415 Mio. sequences processed
=================================================================================================== 416 Mio. sequences processed
=================================================================================================== 417 Mio. sequences processed
=================================================================================================== 418 Mio. sequences processed
=================================================================================================== 419 Mio. sequences processed
=================================================================================================== 420 Mio. sequences processed
=================================================================================================== 421 Mio. sequences processed
=================================================================================================== 422 Mio. sequences processed
=================================================================================================== 423 Mio. sequences processed
=================================================================================================== 424 Mio. sequences processed
=================================================================================================== 425 Mio. sequences processed
=================================================================================================== 426 Mio. sequences processed
=================================================================================================== 427 Mio. sequences processed
=================================================================================================== 428 Mio. sequences processed
=================================================================================================== 429 Mio. sequences processed
=================================================================================================== 430 Mio. sequences processed
=================================================================================================== 431 Mio. sequences processed
=================================================================================================== 432 Mio. sequences processed
=================================================================================================== 433 Mio. sequences processed
=================================================================================================== 434 Mio. sequences processed
=================================================================================================== 435 Mio. sequences processed
=================================================================================================== 436 Mio. sequences processed
=================================================================================================== 437 Mio. sequences processed
=================================================================================================== 438 Mio. sequences processed
=================================================================================================== 439 Mio. sequences processed
=================================================================================================== 440 Mio. sequences processed
=================================================================================================== 441 Mio. sequences processed
=================================================================================================== 442 Mio. sequences processed
=================================================================================================== 443 Mio. sequences processed
=================================================================================================== 444 Mio. sequences processed
============================================================
Time for merging to NR_h: 0h 3m 55s 886ms
Time for merging to NR: 0h 7m 40s 283ms
Database type: Aminoacid
Time for processing: 1h 17m 9s 618ms

This is the output for createindex:

createindex refDB/NR tmp --split-memory-limit 100G 

MMseqs Version:             13.45111
Seed substitution matrix    nucl:nucleotide.out,aa:VTML80.out
k-mer length                0
Alphabet size               nucl:5,aa:21
Compositional bias          1
Max sequence length         65535
Max results per query       300
Mask residues               1
Mask lower case residues    0
Spaced k-mers               1
Spaced k-mer pattern        
Sensitivity                 7.5
k-score                     0
Check compatible            0
Search type                 0
Split database              0
Split memory limit          100G
Verbosity                   3
Threads                     48
Min codons in orf           30
Max codons in length        32734
Max orf gaps                2147483647
Contig start mode           2
Contig end mode             2
Orf start mode              1
Forward frames              1,2,3
Reverse frames              1,2,3
Translation table           1
Translate orf               0
Use all table starts        false
Offset of numeric ids       0
Create lookup               0
Compressed                  0
Add orf stop                false
Overlap between sequences   0
Sequence split mode         1
Header split mode           0
Strand selection            1
Remove temporary files      false

createindex refDB/NR tmp --split-memory-limit 100G 

MMseqs Version:             13.45111
Seed substitution matrix    nucl:nucleotide.out,aa:VTML80.out
k-mer length                0
Alphabet size               nucl:5,aa:21
Compositional bias          1
Max sequence length         65535
Max results per query       300
Mask residues               1
Mask lower case residues    0
Spaced k-mers               1
Spaced k-mer pattern        
Sensitivity                 7.5
k-score                     0
Check compatible            0
Search type                 0
Split database              0
Split memory limit          100G
Verbosity                   3
Threads                     48
Min codons in orf           30
Max codons in length        32734
Max orf gaps                2147483647
Contig start mode           2
Contig end mode             2
Orf start mode              1
Forward frames              1,2,3
Reverse frames              1,2,3
Translation table           1
Translate orf               0
Use all table starts        false
Offset of numeric ids       0
Create lookup               0
Compressed                  0
Add orf stop                false
Overlap between sequences   0
Sequence split mode         1
Header split mode           0
Strand selection            1
Remove temporary files      false

indexdb refDB/NR refDB/NR --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -k 0 --alph-size nucl:5,aa:21 --comp-bias-corr 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score 0 --check-compatible 0 --search-type 0 --split 0 --split-memory-limit 100G -v 3 --threads 48 

Target split mode. Searching through 41 splits
Estimated memory consumption: 79G
Write VERSION (0)
Write META (1)
Write SCOREMATRIX3MER (4)
Write SCOREMATRIX2MER (3)
Write SCOREMATRIXNAME (2)
Write SPACEDPATTERN (23)
Write GENERATOR (22)
Write DBR1INDEX (5)
Write DBR1DATA (6)
Write HDR1INDEX (18)
Write HDR1DATA (19)
Index table: counting k-mers
[=================================================================] 10.84M 1m 4s 920ms
Index table: Masked residues: 61238522
Index table: fill
[=================================================================] 10.84M 1m 25s 193ms
Index statistics
Entries:          3850121923
DB size:          31796 MB
Avg k-mer size:   3.007908
Top 10 k-mers
    SGQQRIA 33175
    FLLLLLA 30439
    ATQAYAV 30261
    LAYGSGV 30200
    CYGPSYQ 30190
    SVAYNPS 30179
    ACNSPVY 30160
    GSLGSSV 30151
    HALLFPS 30146
    ISEQEGT 30145
Write ENTRIES (9)
Write ENTRIESOFFSETS (10)
Write SEQINDEXDATASIZE (15)
Write SEQINDEXSEQOFFSET (16)
Write SEQINDEXDATA (14)
Write ENTRIESNUM (12)
Write SEQCOUNT (13)
Index table: counting k-mers
[=================================================================] 10.85M 1m 3s 858ms
Index table: Masked residues: 61454634
Index table: fill
[=================================================================] 10.85M 1m 22s 65ms
Index statistics
Entries:          3849611059
DB size:          31793 MB
Avg k-mer size:   3.007509
Top 10 k-mers
    SGQQRIA 33182
    FLLLLLA 29650
    ATQAYAV 29520
    GLGTVAK 29423
    KLKLNKS 29407
    LAYGSGV 29406
    GSLGSSV 29390
    MLYKVMT 29388
    ACNSPVY 29374
    NEQILVS 29366
Write ENTRIES (1009)
Write ENTRIESOFFSETS (1010)
Write SEQINDEXDATASIZE (1015)
Write SEQINDEXSEQOFFSET (1016)
Write SEQINDEXDATA (1014)
Write ENTRIESNUM (1012)
Write SEQCOUNT (1013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 9s 665ms
Index table: Masked residues: 61188721
Index table: fill
[=================================================================] 10.84M 1m 30s 911ms
Index statistics
Entries:          3850232186
DB size:          31796 MB
Avg k-mer size:   3.007994
Top 10 k-mers
    SGQQRIA 33408
    FLLLLLA 30301
    ATQAYAV 30153
    AVNDSVL 30055
    DNALQAS 30055
    LAYGSGV 30055
    SVAYNPS 30029
    GSLGSSV 30023
    ISEQEGT 30012
    ACNSPVY 30011
Write ENTRIES (2009)
Write ENTRIESOFFSETS (2010)
Write SEQINDEXDATASIZE (2015)
Write SEQINDEXSEQOFFSET (2016)
Write SEQINDEXDATA (2014)
Write ENTRIESNUM (2012)
Write SEQCOUNT (2013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 3s 736ms
Index table: Masked residues: 61279535
Index table: fill
[=================================================================] 10.84M 1m 21s 843ms
Index statistics
Entries:          3850105067
DB size:          31796 MB
Avg k-mer size:   3.007895
Top 10 k-mers
    SGQQRIA 32981
    FLLLLLA 30126
    ATQAYAV 29941
    GSLGSSV 29847
    EKVLLLL 29841
    KLKLNKS 29837
    DNALQAS 29818
    HALLFPS 29817
    SVAYNPS 29814
    MLYKVMT 29808
Write ENTRIES (3009)
Write ENTRIESOFFSETS (3010)
Write SEQINDEXDATASIZE (3015)
Write SEQINDEXSEQOFFSET (3016)
Write SEQINDEXDATA (3014)
Write ENTRIESNUM (3012)
Write SEQCOUNT (3013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 3s 501ms
Index table: Masked residues: 61136706
Index table: fill
[=================================================================] 10.84M 1m 21s 674ms
Index statistics
Entries:          3850166774
DB size:          31796 MB
Avg k-mer size:   3.007943
Top 10 k-mers
    SGQQRIA 33368
    FLLLLLA 30128
    ATQAYAV 29916
    VLCNGSG 29834
    LAYGSGV 29833
    SVAYNPS 29819
    GSLGSSV 29814
    FSLCYSP 29805
    ILSISKQ 29801
    TELKAKV 29800
Write ENTRIES (4009)
Write ENTRIESOFFSETS (4010)
Write SEQINDEXDATASIZE (4015)
Write SEQINDEXSEQOFFSET (4016)
Write SEQINDEXDATA (4014)
Write ENTRIESNUM (4012)
Write SEQCOUNT (4013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 3s 676ms
Index table: Masked residues: 61264052
Index table: fill
[=================================================================] 10.85M 1m 22s 163ms
Index statistics
Entries:          3850288340
DB size:          31797 MB
Avg k-mer size:   3.008038
Top 10 k-mers
    SGQQRIA 33315
    FLLLLLA 29996
    ATQAYAV 29786
    LAYGSGV 29736
    AVNDSVL 29728
    GSLGSSV 29722
    KLKLNKS 29704
    SVAYNPS 29704
    ACNSPVY 29692
    GQFVLYN 29673
Write ENTRIES (5009)
Write ENTRIESOFFSETS (5010)
Write SEQINDEXDATASIZE (5015)
Write SEQINDEXSEQOFFSET (5016)
Write SEQINDEXDATA (5014)
Write ENTRIESNUM (5012)
Write SEQCOUNT (5013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 4s 230ms
Index table: Masked residues: 61371917
Index table: fill
[=================================================================] 10.84M 1m 21s 243ms
Index statistics
Entries:          3850040390
DB size:          31795 MB
Avg k-mer size:   3.007844
Top 10 k-mers
    SGQQRIA 33009
    FLLLLLA 30239
    ATQAYAV 30076
    LAYGSGV 29994
    GSLGSSV 29988
    SVAYNPS 29975
    MVVCGTL 29966
    FSLCYSP 29963
    KLKLNKS 29958
    HALLFPS 29956
Write ENTRIES (6009)
Write ENTRIESOFFSETS (6010)
Write SEQINDEXDATASIZE (6015)
Write SEQINDEXSEQOFFSET (6016)
Write SEQINDEXDATA (6014)
Write ENTRIESNUM (6012)
Write SEQCOUNT (6013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 3s 405ms
Index table: Masked residues: 61034741
Index table: fill
[=================================================================] 10.85M 1m 21s 828ms
Index statistics
Entries:          3850317055
DB size:          31797 MB
Avg k-mer size:   3.008060
Top 10 k-mers
    SGQQRIA 32887
    FLLLLLA 30184
    ATQAYAV 29964
    LAYGSGV 29853
    GSLGSSV 29847
    KLKLNKS 29837
    HALLFPS 29834
    SVAYNPS 29827
    ACNSPVY 29817
    FLPLAAY 29796
Write ENTRIES (7009)
Write ENTRIESOFFSETS (7010)
Write SEQINDEXDATASIZE (7015)
Write SEQINDEXSEQOFFSET (7016)
Write SEQINDEXDATA (7014)
Write ENTRIESNUM (7012)
Write SEQCOUNT (7013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 4s 797ms
Index table: Masked residues: 61311938
Index table: fill
[=================================================================] 10.84M 1m 21s 46ms
Index statistics
Entries:          3850086594
DB size:          31795 MB
Avg k-mer size:   3.007880
Top 10 k-mers
    SGQQRIA 33346
    FLLLLLA 30182
    ATQAYAV 30024
    KLKLNKS 29930
    AVNDSVL 29924
    LAYGSGV 29921
    MLYKVMT 29906
    GSLGSSV 29905
    ACNSPVY 29878
    LTNVETP 29872
Write ENTRIES (8009)
Write ENTRIESOFFSETS (8010)
Write SEQINDEXDATASIZE (8015)
Write SEQINDEXSEQOFFSET (8016)
Write SEQINDEXDATA (8014)
Write ENTRIESNUM (8012)
Write SEQCOUNT (8013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 3s 400ms
Index table: Masked residues: 61287007
Index table: fill
[=================================================================] 10.84M 1m 21s 849ms
Index statistics
Entries:          3850445130
DB size:          31798 MB
Avg k-mer size:   3.008160
Top 10 k-mers
    SGQQRIA 33244
    FLLLLLA 30250
    ATQAYAV 30105
    GLGTVAK 30034
    KLKLNKS 30017
    LAYGSGV 30007
    GSLGSSV 29989
    ACNSPVY 29970
    HALLFPS 29959
    ISEQEGT 29956
Write ENTRIES (9009)
Write ENTRIESOFFSETS (9010)
Write SEQINDEXDATASIZE (9015)
Write SEQINDEXSEQOFFSET (9016)
Write SEQINDEXDATA (9014)
Write ENTRIESNUM (9012)
Write SEQCOUNT (9013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 9s 678ms
Index table: Masked residues: 61466528
Index table: fill
[=================================================================] 10.85M 1m 30s 622ms
Index statistics
Entries:          3849908410
DB size:          31794 MB
Avg k-mer size:   3.007741
Top 10 k-mers
    SGQQRIA 33047
    FLLLLLA 30087
    ATQAYAV 29938
    KLKLNKS 29845
    LAYGSGV 29839
    SVAYNPS 29821
    GSLGSSV 29801
    ACNSPVY 29799
    KHFCLLP 29784
    VVLVLLR 29783
Write ENTRIES (10009)
Write ENTRIESOFFSETS (10010)
Write SEQINDEXDATASIZE (10015)
Write SEQINDEXSEQOFFSET (10016)
Write SEQINDEXDATA (10014)
Write ENTRIESNUM (10012)
Write SEQCOUNT (10013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 3s 921ms
Index table: Masked residues: 61076649
Index table: fill
[=================================================================] 10.84M 1m 21s 691ms
Index statistics
Entries:          3850338479
DB size:          31797 MB
Avg k-mer size:   3.008077
Top 10 k-mers
    SGQQRIA 32957
    FLLLLLA 30300
    ATQAYAV 30150
    VLCNGSG 30032
    LAYGSGV 30032
    AVNDSVL 30028
    CYGPSYQ 30023
    TELKAKV 30017
    SVAYNPS 30014
    GSLGSSV 30004
Write ENTRIES (11009)
Write ENTRIESOFFSETS (11010)
Write SEQINDEXDATASIZE (11015)
Write SEQINDEXSEQOFFSET (11016)
Write SEQINDEXDATA (11014)
Write ENTRIESNUM (11012)
Write SEQCOUNT (11013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 9s 855ms
Index table: Masked residues: 61187843
Index table: fill
[=================================================================] 10.85M 1m 30s 773ms
Index statistics
Entries:          3850149201
DB size:          31796 MB
Avg k-mer size:   3.007929
Top 10 k-mers
    SGQQRIA 33023
    FLLLLLA 30135
    ATQAYAV 29963
    LAYGSGV 29880
    SVAYNPS 29853
    GSLGSSV 29842
    HALLFPS 29838
    ACNSPVY 29836
    KLKLNKS 29820
    ISEQEGT 29805
Write ENTRIES (12009)
Write ENTRIESOFFSETS (12010)
Write SEQINDEXDATASIZE (12015)
Write SEQINDEXSEQOFFSET (12016)
Write SEQINDEXDATA (12014)
Write ENTRIESNUM (12012)
Write SEQCOUNT (12013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 5s 214ms
Index table: Masked residues: 61302946
Index table: fill
[=================================================================] 10.84M 1m 22s 633ms
Index statistics
Entries:          3850002684
DB size:          31795 MB
Avg k-mer size:   3.007815
Top 10 k-mers
    SGQQRIA 33277
    FLLLLLA 30092
    ATQAYAV 29927
    MVVCGTL 29836
    KLKLNKS 29833
    LAYGSGV 29827
    GSLGSSV 29825
    ILSISKQ 29800
    LKTNVKN 29795
    ACNSPVY 29795
Write ENTRIES (13009)
Write ENTRIESOFFSETS (13010)
Write SEQINDEXDATASIZE (13015)
Write SEQINDEXSEQOFFSET (13016)
Write SEQINDEXDATA (13014)
Write ENTRIESNUM (13012)
Write SEQCOUNT (13013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 6s 563ms
Index table: Masked residues: 61272135
Index table: fill
[=================================================================] 10.85M 1m 21s 448ms
Index statistics
Entries:          3850117980
DB size:          31796 MB
Avg k-mer size:   3.007905
Top 10 k-mers
    SGQQRIA 33363
    FLLLLLA 29998
    ATQAYAV 29857
    AVNDSVL 29755
    LAYGSGV 29740
    GSLGSSV 29722
    MVVCGTL 29711
    MLYKVMT 29710
    HALLFPS 29694
    ACNSPVY 29694
Write ENTRIES (14009)
Write ENTRIESOFFSETS (14010)
Write SEQINDEXDATASIZE (14015)
Write SEQINDEXSEQOFFSET (14016)
Write SEQINDEXDATA (14014)
Write ENTRIESNUM (14012)
Write SEQCOUNT (14013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 11s 175ms
Index table: Masked residues: 61180635
Index table: fill
[=================================================================] 10.84M 1m 31s 883ms
Index statistics
Entries:          3850138116
DB size:          31796 MB
Avg k-mer size:   3.007920
Top 10 k-mers
    SGQQRIA 33160
    FLLLLLA 30415
    ATQAYAV 30219
    LAYGSGV 30142
    SVAYNPS 30130
    GSLGSSV 30128
    ACNSPVY 30105
    MLYKVMT 30094
    FLPLAAY 30091
    KLKLNKS 30076
Write ENTRIES (15009)
Write ENTRIESOFFSETS (15010)
Write SEQINDEXDATASIZE (15015)
Write SEQINDEXSEQOFFSET (15016)
Write SEQINDEXDATA (15014)
Write ENTRIESNUM (15012)
Write SEQCOUNT (15013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 5s 779ms
Index table: Masked residues: 61262358
Index table: fill
[=================================================================] 10.84M 1m 22s 983ms
Index statistics
Entries:          3849957767
DB size:          31795 MB
Avg k-mer size:   3.007780
Top 10 k-mers
    SGQQRIA 33057
    FLLLLLA 30065
    ATQAYAV 29891
    LAYGSGV 29796
    VLCNGSG 29781
    KLKLNKS 29780
    SVAYNPS 29774
    ACNSPVY 29763
    GSLGSSV 29756
    MLYKVMT 29752
Write ENTRIES (16009)
Write ENTRIESOFFSETS (16010)
Write SEQINDEXDATASIZE (16015)
Write SEQINDEXSEQOFFSET (16016)
Write SEQINDEXDATA (16014)
Write ENTRIESNUM (16012)
Write SEQCOUNT (16013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 4s 149ms
Index table: Masked residues: 61004416
Index table: fill
[=================================================================] 10.84M 1m 21s 354ms
Index statistics
Entries:          3850452900
DB size:          31798 MB
Avg k-mer size:   3.008166
Top 10 k-mers
    SGQQRIA 33588
    FLLLLLA 30144
    ATQAYAV 29993
    LAYGSGV 29895
    MVVCGTL 29874
    AVNDSVL 29868
    CYGPSYQ 29867
    GSLGSSV 29864
    ACNSPVY 29854
    ISEQEGT 29838
Write ENTRIES (17009)
Write ENTRIESOFFSETS (17010)
Write SEQINDEXDATASIZE (17015)
Write SEQINDEXSEQOFFSET (17016)
Write SEQINDEXDATA (17014)
Write ENTRIESNUM (17012)
Write SEQCOUNT (17013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 9s 890ms
Index table: Masked residues: 61440134
Index table: fill
[=================================================================] 10.85M 1m 31s 477ms
Index statistics
Entries:          3849779316
DB size:          31794 MB
Avg k-mer size:   3.007640
Top 10 k-mers
    SGQQRIA 33287
    FLLLLLA 29845
    ATQAYAV 29665
    LAYGSGV 29575
    KLKLNKS 29567
    GSLGSSV 29566
    FSLCYSP 29555
    SVAYNPS 29551
    MLYKVMT 29550
    ACNSPVY 29542
Write ENTRIES (18009)
Write ENTRIESOFFSETS (18010)
Write SEQINDEXDATASIZE (18015)
Write SEQINDEXSEQOFFSET (18016)
Write SEQINDEXDATA (18014)
Write ENTRIESNUM (18012)
Write SEQCOUNT (18013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 12s 514ms
Index table: Masked residues: 61281590
Index table: fill
[=================================================================] 10.84M 1m 31s 295ms
Index statistics
Entries:          3850348785
DB size:          31797 MB
Avg k-mer size:   3.008085
Top 10 k-mers
    SGQQRIA 33176
    FLLLLLA 30272
    ATQAYAV 30107
    AVNDSVL 29995
    KLKLNKS 29989
    LAYGSGV 29986
    MVVCGTL 29961
    GSLGSSV 29957
    ACNSPVY 29952
    MLYKVMT 29936
Write ENTRIES (19009)
Write ENTRIESOFFSETS (19010)
Write SEQINDEXDATASIZE (19015)
Write SEQINDEXSEQOFFSET (19016)
Write SEQINDEXDATA (19014)
Write ENTRIESNUM (19012)
Write SEQCOUNT (19013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 5s 347ms
Index table: Masked residues: 61054807
Index table: fill
[=================================================================] 10.84M 1m 21s 327ms
Index statistics
Entries:          3850437386
DB size:          31798 MB
Avg k-mer size:   3.008154
Top 10 k-mers
    SGQQRIA 33395
    FLLLLLA 30061
    ATQAYAV 29933
    LAYGSGV 29830
    KLKLNKS 29820
    SVAYNPS 29801
    ACNSPVY 29795
    MLYKVMT 29785
    GSLGSSV 29781
    GQFVLYN 29758
Write ENTRIES (20009)
Write ENTRIESOFFSETS (20010)
Write SEQINDEXDATASIZE (20015)
Write SEQINDEXSEQOFFSET (20016)
Write SEQINDEXDATA (20014)
Write ENTRIESNUM (20012)
Write SEQCOUNT (20013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 10s 948ms
Index table: Masked residues: 61358532
Index table: fill
[=================================================================] 10.85M 1m 29s 524ms
Index statistics
Entries:          3849836671
DB size:          31794 MB
Avg k-mer size:   3.007685
Top 10 k-mers
    SGQQRIA 33178
    FLLLLLA 29948
    ATQAYAV 29740
    LAYGSGV 29648
    AVNDSVL 29635
    CYGPSYQ 29631
    SVAYNPS 29630
    GSLGSSV 29623
    ACNSPVY 29604
    FLPLAAY 29581
Write ENTRIES (21009)
Write ENTRIESOFFSETS (21010)
Write SEQINDEXDATASIZE (21015)
Write SEQINDEXSEQOFFSET (21016)
Write SEQINDEXDATA (21014)
Write ENTRIESNUM (21012)
Write SEQCOUNT (21013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 6s 273ms
Index table: Masked residues: 61202841
Index table: fill
[=================================================================] 10.84M 1m 19s 228ms
Index statistics
Entries:          3850254812
DB size:          31796 MB
Avg k-mer size:   3.008012
Top 10 k-mers
    SGQQRIA 33182
    FLLLLLA 30118
    ATQAYAV 29943
    VLCNGSG 29851
    LAYGSGV 29851
    SVAYNPS 29837
    GSLGSSV 29834
    HALLFPS 29812
    ACNSPVY 29806
    ISEQEGT 29802
Write ENTRIES (22009)
Write ENTRIESOFFSETS (22010)
Write SEQINDEXDATASIZE (22015)
Write SEQINDEXSEQOFFSET (22016)
Write SEQINDEXDATA (22014)
Write ENTRIESNUM (22012)
Write SEQCOUNT (22013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 11s 694ms
Index table: Masked residues: 61145173
Index table: fill
[=================================================================] 10.85M 1m 27s 632ms
Index statistics
Entries:          3850176462
DB size:          31796 MB
Avg k-mer size:   3.007950
Top 10 k-mers
    SGQQRIA 33446
    FLLLLLA 30080
    ATQAYAV 29847
    GSLGSSV 29771
    AVNDSVL 29749
    CYGPSYQ 29749
    SVAYNPS 29744
    HALLFPS 29718
    ACNSPVY 29716
    KHFCLLP 29702
Write ENTRIES (23009)
Write ENTRIESOFFSETS (23010)
Write SEQINDEXDATASIZE (23015)
Write SEQINDEXSEQOFFSET (23016)
Write SEQINDEXDATA (23014)
Write ENTRIESNUM (23012)
Write SEQCOUNT (23013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 8s 743ms
Index table: Masked residues: 61136999
Index table: fill
[=================================================================] 10.85M 1m 17s 938ms
Index statistics
Entries:          3850256482
DB size:          31796 MB
Avg k-mer size:   3.008013
Top 10 k-mers
    SGQQRIA 33137
    FLLLLLA 29781
    ATQAYAV 29580
    LAYGSGV 29521
    CYGPSYQ 29506
    SVAYNPS 29500
    FSLCYSP 29491
    GSLGSSV 29490
    ACNSPVY 29486
    ILSISKQ 29461
Write ENTRIES (24009)
Write ENTRIESOFFSETS (24010)
Write SEQINDEXDATASIZE (24015)
Write SEQINDEXSEQOFFSET (24016)
Write SEQINDEXDATA (24014)
Write ENTRIESNUM (24012)
Write SEQCOUNT (24013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 7s 705ms
Index table: Masked residues: 61196311
Index table: fill
[=================================================================] 10.85M 1m 18s 198ms
Index statistics
Entries:          3850220763
DB size:          31796 MB
Avg k-mer size:   3.007985
Top 10 k-mers
    SGQQRIA 33140
    FLLLLLA 29995
    ATQAYAV 29827
    LAYGSGV 29771
    MVVCGTL 29759
    CYGPSYQ 29753
    KLKLNKS 29751
    SVAYNPS 29748
    ACNSPVY 29735
    MLYKVMT 29712
Write ENTRIES (25009)
Write ENTRIESOFFSETS (25010)
Write SEQINDEXDATASIZE (25015)
Write SEQINDEXSEQOFFSET (25016)
Write SEQINDEXDATA (25014)
Write ENTRIESNUM (25012)
Write SEQCOUNT (25013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 11s 929ms
Index table: Masked residues: 61047096
Index table: fill
[=================================================================] 10.85M 1m 27s 703ms
Index statistics
Entries:          3850450523
DB size:          31798 MB
Avg k-mer size:   3.008164
Top 10 k-mers
    SGQQRIA 33254
    FLLLLLA 30111
    ATQAYAV 29941
    LAYGSGV 29869
    CYGPSYQ 29850
    SVAYNPS 29847
    GSLGSSV 29830
    ACNSPVY 29828
    KLKLNKS 29823
    HALLFPS 29811
Write ENTRIES (26009)
Write ENTRIESOFFSETS (26010)
Write SEQINDEXDATASIZE (26015)
Write SEQINDEXSEQOFFSET (26016)
Write SEQINDEXDATA (26014)
Write ENTRIESNUM (26012)
Write SEQCOUNT (26013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 6s 57ms
Index table: Masked residues: 61463986
Index table: fill
[=================================================================] 10.84M 1m 17s 662ms
Index statistics
Entries:          3849969010
DB size:          31795 MB
Avg k-mer size:   3.007788
Top 10 k-mers
    SGQQRIA 33231
    FLLLLLA 30254
    ATQAYAV 30083
    MVVCGTL 29995
    LAYGSGV 29994
    KLKLNKS 29983
    GSLGSSV 29978
    ILSISKQ 29956
    TELKAKV 29954
    ACNSPVY 29953
Write ENTRIES (27009)
Write ENTRIESOFFSETS (27010)
Write SEQINDEXDATASIZE (27015)
Write SEQINDEXSEQOFFSET (27016)
Write SEQINDEXDATA (27014)
Write ENTRIESNUM (27012)
Write SEQCOUNT (27013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 6s 421ms
Index table: Masked residues: 61447173
Index table: fill
[=================================================================] 10.84M 1m 17s 628ms
Index statistics
Entries:          3850043049
DB size:          31795 MB
Avg k-mer size:   3.007846
Top 10 k-mers
    SGQQRIA 33530
    FLLLLLA 29878
    ATQAYAV 29693
    VLCNGSG 29651
    LAYGSGV 29644
    CYGPSYQ 29636
    GSLGSSV 29614
    ACNSPVY 29613
    KLKLNKS 29597
    MVVCGTL 29592
Write ENTRIES (28009)
Write ENTRIESOFFSETS (28010)
Write SEQINDEXDATASIZE (28015)
Write SEQINDEXSEQOFFSET (28016)
Write SEQINDEXDATA (28014)
Write ENTRIESNUM (28012)
Write SEQCOUNT (28013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 6s 265ms
Index table: Masked residues: 61304785
Index table: fill
[=================================================================] 10.84M 1m 17s 421ms
Index statistics
Entries:          3849995941
DB size:          31795 MB
Avg k-mer size:   3.007809
Top 10 k-mers
    SGQQRIA 33071
    FLLLLLA 30126
    ATQAYAV 29984
    LAYGSGV 29870
    GLGTVAK 29855
    VVLVLLR 29854
    DNALQAS 29854
    SVAYNPS 29854
    GSLGSSV 29851
    ACNSPVY 29835
Write ENTRIES (29009)
Write ENTRIESOFFSETS (29010)
Write SEQINDEXDATASIZE (29015)
Write SEQINDEXSEQOFFSET (29016)
Write SEQINDEXDATA (29014)
Write ENTRIESNUM (29012)
Write SEQCOUNT (29013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 4s 542ms
Index table: Masked residues: 61389881
Index table: fill
[=================================================================] 10.84M 1m 17s 448ms
Index statistics
Entries:          3849817877
DB size:          31794 MB
Avg k-mer size:   3.007670
Top 10 k-mers
    SGQQRIA 33369
    FLLLLLA 30183
    ATQAYAV 30042
    VLCNGSG 29941
    MVVCGTL 29937
    LAYGSGV 29936
    GSLGSSV 29920
    ACNSPVY 29901
    TELKAKV 29890
    TLGWLVV 29887
Write ENTRIES (30009)
Write ENTRIESOFFSETS (30010)
Write SEQINDEXDATASIZE (30015)
Write SEQINDEXSEQOFFSET (30016)
Write SEQINDEXDATA (30014)
Write ENTRIESNUM (30012)
Write SEQCOUNT (30013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 5s 69ms
Index table: Masked residues: 61266593
Index table: fill
[=================================================================] 10.85M 1m 17s 362ms
Index statistics
Entries:          3849915772
DB size:          31795 MB
Avg k-mer size:   3.007747
Top 10 k-mers
    SGQQRIA 33329
    FLLLLLA 30187
    ATQAYAV 30023
    GLGTVAK 29954
    LAYGSGV 29930
    CYGPSYQ 29910
    HALLFPS 29907
    SVAYNPS 29904
    GSLGSSV 29900
    ACNSPVY 29883
Write ENTRIES (31009)
Write ENTRIESOFFSETS (31010)
Write SEQINDEXDATASIZE (31015)
Write SEQINDEXSEQOFFSET (31016)
Write SEQINDEXDATA (31014)
Write ENTRIESNUM (31012)
Write SEQCOUNT (31013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 4s 938ms
Index table: Masked residues: 61324289
Index table: fill
[=================================================================] 10.85M 1m 17s 470ms
Index statistics
Entries:          3850149373
DB size:          31796 MB
Avg k-mer size:   3.007929
Top 10 k-mers
    SGQQRIA 32987
    FLLLLLA 29953
    ATQAYAV 29771
    LAYGSGV 29658
    GSLGSSV 29657
    KHHFLFL 29638
    EKVLLLL 29637
    CYGPSYQ 29636
    HALLFPS 29633
    SVAYNPS 29626
Write ENTRIES (32009)
Write ENTRIESOFFSETS (32010)
Write SEQINDEXDATASIZE (32015)
Write SEQINDEXSEQOFFSET (32016)
Write SEQINDEXDATA (32014)
Write ENTRIESNUM (32012)
Write SEQCOUNT (32013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 4s 962ms
Index table: Masked residues: 61229032
Index table: fill
[=================================================================] 10.85M 1m 17s 193ms
Index statistics
Entries:          3850133104
DB size:          31796 MB
Avg k-mer size:   3.007916
Top 10 k-mers
    SGQQRIA 33206
    FLLLLLA 29981
    ATQAYAV 29773
    VLCNGSG 29656
    KLKLNKS 29654
    LAYGSGV 29650
    AVNDSVL 29630
    GSLGSSV 29622
    DNALQAS 29621
    ACNSPVY 29612
Write ENTRIES (33009)
Write ENTRIESOFFSETS (33010)
Write SEQINDEXDATASIZE (33015)
Write SEQINDEXSEQOFFSET (33016)
Write SEQINDEXDATA (33014)
Write ENTRIESNUM (33012)
Write SEQCOUNT (33013)
Index table: counting k-mers
[=================================================================] 10.85M 1m 4s 860ms
Index table: Masked residues: 61307069
Index table: fill
[=================================================================] 10.85M 1m 17s 150ms
Index statistics
Entries:          3849878129
DB size:          31794 MB
Avg k-mer size:   3.007717
Top 10 k-mers
    SGQQRIA 32843
    FLLLLLA 30212
    ATQAYAV 30033
    VLCNGSG 29957
    KLKLNKS 29939
    LAYGSGV 29937
    ILSISKQ 29921
    ISEQEGT 29919
    GSLGSSV 29913
    ACNSPVY 29909
Write ENTRIES (34009)
Write ENTRIESOFFSETS (34010)
Write SEQINDEXDATASIZE (34015)
Write SEQINDEXSEQOFFSET (34016)
Write SEQINDEXDATA (34014)
Write ENTRIESNUM (34012)
Write SEQCOUNT (34013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 5s 319ms
Index table: Masked residues: 61203280
Index table: fill
[=================================================================] 10.84M 1m 17s 218ms
Index statistics
Entries:          3850317290
DB size:          31797 MB
Avg k-mer size:   3.008060
Top 10 k-mers
    SGQQRIA 33293
    FLLLLLA 30047
    ATQAYAV 29922
    KLKLNKS 29793
    LAYGSGV 29790
    HALLFPS 29770
    MVVCGTL 29767
    SVAYNPS 29766
    MLYKVMT 29766
    GSLGSSV 29766
Write ENTRIES (35009)
Write ENTRIESOFFSETS (35010)
Write SEQINDEXDATASIZE (35015)
Write SEQINDEXSEQOFFSET (35016)
Write SEQINDEXDATA (35014)
Write ENTRIESNUM (35012)
Write SEQCOUNT (35013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 4s 360ms
Index table: Masked residues: 61352470
Index table: fill
[=================================================================] 10.84M 1m 17s 49ms
Index statistics
Entries:          3849997806
DB size:          31795 MB
Avg k-mer size:   3.007811
Top 10 k-mers
    SGQQRIA 33159
    FLLLLLA 30256
    ATQAYAV 30113
    LAYGSGV 30002
    GSLGSSV 29975
    SVAYNPS 29966
    ACNSPVY 29962
    KHFCLLP 29940
    KLKLNKS 29934
    MLYKVMT 29933
Write ENTRIES (36009)
Write ENTRIESOFFSETS (36010)
Write SEQINDEXDATASIZE (36015)
Write SEQINDEXSEQOFFSET (36016)
Write SEQINDEXDATA (36014)
Write ENTRIESNUM (36012)
Write SEQCOUNT (36013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 4s 494ms
Index table: Masked residues: 61207851
Index table: fill
[=================================================================] 10.84M 1m 17s 227ms
Index statistics
Entries:          3850216299
DB size:          31796 MB
Avg k-mer size:   3.007981
Top 10 k-mers
    SGQQRIA 33099
    FLLLLLA 29994
    ATQAYAV 29804
    VLCNGSG 29727
    LAYGSGV 29718
    CYGPSYQ 29709
    KLKLNKS 29704
    GSLGSSV 29701
    SVAYNPS 29697
    ISEQEGT 29678
Write ENTRIES (37009)
Write ENTRIESOFFSETS (37010)
Write SEQINDEXDATASIZE (37015)
Write SEQINDEXSEQOFFSET (37016)
Write SEQINDEXDATA (37014)
Write ENTRIESNUM (37012)
Write SEQCOUNT (37013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 4s 815ms
Index table: Masked residues: 61187236
Index table: fill
[=================================================================] 10.84M 1m 16s 934ms
Index statistics
Entries:          3850225941
DB size:          31796 MB
Avg k-mer size:   3.007989
Top 10 k-mers
    SGQQRIA 33128
    FLLLLLA 30170
    ATQAYAV 29962
    VLCNGSG 29895
    LAYGSGV 29894
    KLKLNKS 29870
    GSLGSSV 29870
    TELKAKV 29857
    ACNSPVY 29843
    NEQILVS 29829
Write ENTRIES (38009)
Write ENTRIESOFFSETS (38010)
Write SEQINDEXDATASIZE (38015)
Write SEQINDEXSEQOFFSET (38016)
Write SEQINDEXDATA (38014)
Write ENTRIESNUM (38012)
Write SEQCOUNT (38013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 5s 594ms
Index table: Masked residues: 61224305
Index table: fill
[=================================================================] 10.84M 1m 16s 989ms
Index statistics
Entries:          3850265437
DB size:          31797 MB
Avg k-mer size:   3.008020
Top 10 k-mers
    SGQQRIA 32988
    FLLLLLA 30232
    ATQAYAV 30073
    LAYGSGV 29988
    CYGPSYQ 29965
    SVAYNPS 29965
    ACNSPVY 29944
    HALLFPS 29941
    GSLGSSV 29937
    MLYKVMT 29929
Write ENTRIES (39009)
Write ENTRIESOFFSETS (39010)
Write SEQINDEXDATASIZE (39015)
Write SEQINDEXSEQOFFSET (39016)
Write SEQINDEXDATA (39014)
Write ENTRIESNUM (39012)
Write SEQCOUNT (39013)
Index table: counting k-mers
[=================================================================] 10.84M 1m 7s 44ms
Index table: Masked residues: 61246094
Index table: fill
[=================================================================] 10.84M 1m 17s 250ms
Index statistics
Entries:          3850118943
DB size:          31796 MB
Avg k-mer size:   3.007905
Top 10 k-mers
    SGQQRIA 33367
    FLLLLLA 30314
    ATQAYAV 30098
    GLGTVAK 30043
    LAYGSGV 29998
    GSLGSSV 29988
    SVAYNPS 29980
    HALLFPS 29973
    TELKAKV 29968
    ACNSPVY 29959
Write ENTRIES (40009)
Write ENTRIESOFFSETS (40010)
Write SEQINDEXDATASIZE (40015)
Write SEQINDEXSEQOFFSET (40016)
Write SEQINDEXDATA (40014)
Write ENTRIESNUM (40012)
Write SEQCOUNT (40013)
Time for merging to NR.idx: 0h 0m 0s 603ms
Time for processing: 2h 25m 32s 642ms

Unfortunately, I don't have the output of createtaxdb as managed to run it in interactive mode (it took less than 10 minutes).

This is the output from the easy-taxonomy command:


easy-taxonomy contigs.fasta refDB/NR alnRes tmp --split-memory-limit 100G --threads 16 

MMseqs Version:                         13.45111
ORF filter                              0
ORF filter e-value                      100
ORF filter sensitivity                  2
LCA mode                                3
Majority threshold                      0.5
Vote mo
```de                               1
LCA ranks                               
Column with taxonomic lineage           0
Compressed                              0
Threads                                 16
Verbosity                               3
Taxon blacklist                         12908:unclassified sequences,28384:other sequences
Substitution matrix                     nucl:nucleotide.out,aa:blosum62.out
Add backtrace                           false
Alignment mode                          0
Alignment mode                          0
Allow wrapped scoring                   false
E-value threshold                       0.001
Seq. id. threshold                      0
Min alignment length                    0
Seq. id. mode                           0
Alternative alignments                  0
Coverage threshold                      0
Coverage mode                           0
Max sequence length                     65535
Compositional bias                      1
Max reject                              2147483647
Max accept                              2147483647
Include identical seq. id.              false
Preload mode                            0
Pseudo count a                          1
Pseudo count b                          1.5
Score bias                              0
Realign hits                            false
Realign score bias                      -0.2
Realign max seqs                        2147483647
Gap open cost                           nucl:5,aa:11
Gap extension cost                      nucl:2,aa:1
Zdrop                                   40
Seed substitution matrix                nucl:nucleotide.out,aa:VTML80.out
Sensitivity                             4
k-mer length                            0
k-score                                 2147483647
Alphabet size                           nucl:5,aa:21
Max results per query                   300
Split database                          0
Split mode                              2
Split memory limit                      100G
Diagonal scoring                        true
Exact k-mer matching                    0
Mask residues                           1
Mask lower case residues                0
Minimum diagonal score                  15
Spaced k-mers                           1
Spaced k-mer pattern                    
Local temporary path                    
Rescore mode                            0
Remove hits by seq. id. and coverage    false
Sort results                            0
Mask profile                            1
Profile E-value threshold               0.001
Global sequence weighting               false
Allow deletions                         false
Filter MSA                              1
Maximum seq. id. threshold              0.9
Minimum seq. id.                        0
Minimum score per column                -20
Minimum coverage                        0
Select N most diverse seqs              1000
Min codons in orf                       30
Max codons in length                    32734
Max orf gaps                            2147483647
Contig start mode                       2
Contig end mode                         2
Orf start mode                          1
Forward frames                          1,2,3
Reverse frames                          1,2,3
Translation table                       1
Translate orf                           0
Use all table starts                    false
Offset of numeric ids                   0
Create lookup                           0
Add orf stop                            false
Overlap between sequences               0
Sequence split mode                     1
Header split mode                       0
Chain overlapping alignments            0
Merge query                             1
Search type                             0
Search iterations                       1
Start sensitivity                       4
Search steps                            1
Exhaustive search mode                  false
Filter results during exhaustive search 0
Strand selection                        1
LCA search mode                         false
Disk space limit                        0
MPI runner                              
Force restart with latest tmp           false
Remove temporary files                  true
Report mode                             0
Alignment format                        0
Format alignment output                 query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output                         false
First sequence as representative        false
Target column                           1
Add full header                         false
Sequence source                         0
Database type                           0
Shuffle input database                  true
Createdb mode                           1
Write lookup file                       0

createdb /contigs.fasta tmp/18031188072042168038/query --dbtype 0 --shuffle 1 --createdb-mode 1 --write-lookup 0 --id-offset 0 --compressed 0 -v 3 

Shuffle database cannot be combined with --createdb-mode 0
We recompute with --shuffle 0
Converting sequences
[Multiline fasta can not be combined with --createdb-mode 0
We recompute with --createdb-mode 1
Time for merging to query_h: 0h 0m 0s 2ms
Time for merging to query: 0h 0m 0s 1ms
[=================================================================================
Time for merging to query_h: 0h 0m 0s 2ms
Time for merging to query: 0h 0m 0s 2ms
Database type: Nucleotide
Time for processing: 0h 0m 8s 216ms
Create directory tmp/18031188072042168038/taxonomy_tmp
taxonomy tmp/18031188072042168038/query refDB/NR tmp/18031188072042168038/result tmp/18031188072042168038/taxonomy_tmp --tax-output-mode 2 --threads 16 --split-memory-limit 100G --remove-tmp-files 1 

extractorfs tmp/18031188072042168038/query tmp/18031188072042168038/taxonomy_tmp/2085806724977121770/orfs_aa --min-length 30 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 1 --use-all-table-starts 0 --id-offset 0 --create-lookup 0 --threads 16 --compressed 0 -v 3 

[=================================================================] 810.40K 31s 522ms
Time for merging to orfs_aa_h: 0h 0m 16s 759ms
Time for merging to orfs_aa: 0h 0m 22s 22ms
Time for processing: 0h 1m 23s 421ms
prefilter tmp/18031188072042168038/taxonomy_tmp/2085806724977121770/orfs_aa refDB/NR.idx tmp/18031188072042168038/taxonomy_tmp/2085806724977121770/orfs_pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -s 2 -k 0 --k-score 2147483647 --alph-size nucl:5,aa:21 --max-seq-len 65535 --max-seqs 1 --split 0 --split-mode 2 --split-memory-limit 100G -c 0 --cov-mode 0 --comp-bias-corr 1 --diag-score 0 --exact-kmer-matching 0 --mask 1 --mask-lower-case 0 --min-ungapped-score 3 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 16 --compressed 0 -v 3 

Index version: 16
Generated by:  13.45111
ScoreMatrix:  VTML80.out
Query database size: 47918555 type: Aminoacid
Target split mode. Searching through 41 splits
Estimated memory consumption: 64G
Target database size: 444603205 type: Aminoacid
Process prefiltering step 1 of 41

k-mer similarity threshold: 163
Starting prefiltering scores calculation (step 1 of 41)
Query db start 1 to 47918555
Target db start 1 to 10838348
mgabriell1 commented 2 years ago

I have also noticed that the NR database and its index files NR.idx... occupying around 1.9 TB of disk space. Is that normal?