soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.36k stars 190 forks source link

.dbtype already exists error when clustering using profiles #844

Open schmittel opened 4 months ago

schmittel commented 4 months ago

Hi,

I'm having difficulty clustering using profiles when following the instructions in the wiki. Specifically I'm referring to this section:

# extract consensus sequences from profiles
mmseqs profile2consensus profileDB1 profileDB1_consensus
# search with profiles against consensus sequences of seqDB1
mmseqs search profileDB1 profileDB1_consensus resultDB2 tmp --add-self-matches -a # Add your cluster criteria here
# cluster the results 
mmseqs clust profileDB1 resultDB2 profileDB1_clu

I can run mmseqs search without issue but when I run mmseqs clust I get the following error:

Create directory /final/db_cluster/low_1/Genus02938/Genus02938_DB
cluster /final/db_profile/low_1/Genus02938/Genus02938_DB /final/db_profile_vs_consensus/low_1/Genus02938/Genus02938_DB /final/db_cluster/low_1/Genus02938/Genus02938_DB

MMseqs Version:                         15.6f452
Substitution matrix                     aa:blosum62.out,nucl:nucleotide.out
Seed substitution matrix                aa:VTML80.out,nucl:nucleotide.out
Sensitivity                             4
k-mer length                            0
Target search mode                      0
k-score                                 seq:2147483647,prof:2147483647
Alphabet size                           aa:21,nucl:5
Max sequence length                     65535
Max results per query                   20
Split database                          0
Split mode                              2
Split memory limit                      0
Coverage threshold                      0.8
Coverage mode                           0
Compositional bias                      1
Compositional bias                      1
Diagonal scoring                        true
Exact k-mer matching                    0
Mask residues                           1
Mask residues probability               0.9
Mask lower case residues                0
Minimum diagonal score                  15
Selected taxa
Include identical seq. id.              false
Spaced k-mers                           1
Preload mode                            0
Pseudo count a                          substitution:1.100,context:1.400
Pseudo count b                          substitution:4.100,context:5.800
Spaced k-mer pattern
Local temporary path
Threads                                 144
Compressed                              0
Verbosity                               3
Add backtrace                           false
Alignment mode                          3
Alignment mode                          0
Allow wrapped scoring                   false
E-value threshold                       0.001
Seq. id. threshold                      0
Min alignment length                    0
Seq. id. mode                           0
Alternative alignments                  0
Max reject                              2147483647
Max accept                              2147483647
Score bias                              0
Realign hits                            false
Realign score bias                      -0.2
Realign max seqs                        2147483647
Correlation score weight                0
Gap open cost                           aa:11,nucl:5
Gap extension cost                      aa:1,nucl:2
Zdrop                                   40
Rescore mode                            0
Remove hits by seq. id. and coverage    false
Sort results                            0
Cluster mode                            0
Max connected component depth           1000
Similarity type                         2
Weight file name
Cluster Weight threshold                0.9
Single step clustering                  false
Cascaded clustering steps               3
Cluster reassign                        false
Remove temporary files                  false
Force restart with latest tmp           false
MPI runner
k-mers per sequence                     21
Scale k-mers per sequence               aa:0.000,nucl:0.200
Adjust k-mer length                     false
Shift hash                              67
Include only extendable                 false
Skip repeating k-mers                   false

Set cluster sensitivity to -s 6.000000
Set cluster mode SET COVER
Set cluster iterations to 3
/final/db_profile_vs_consensus/low_1/Genus02938/Genus02938_DB.dbtype exists already!

Yes, /final/db_profile_vs_consensus/low_1/Genus02938/Genus02938_DB.dbtype already exists; it was created by mmseqs search. I'm not sure why mmseqs clust cares? Do you have any ideas - I can't figure this out. Many thanks!!

schmittel commented 4 months ago

I just learned that mmseqs cluster and mmseqs clust were different things, which solved the issue. Apologies for the confusion.