soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
MIT License
1.43k stars 195 forks source link

Another search against Pfam not working (database and search-type) #195

Closed gaboentropy closed 5 years ago

gaboentropy commented 5 years ago

Expected Behavior

It was working before. Same database and protein sequences

Current Behavior

:(

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders. mmseqs easy-search genome_digest.faa.bz2 mmseqsDB/Pfam resultFile TMP

The Pfam database built as per instruction in wiki. After it failed I rebuilt the database in case the format had changed. Still same error, still not working.

MMseqs Output (for bugs)

Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.

MMseqs Version: 70d09a2ea505d8b22d80286ae13dd4d9826ea303 Substitution matrix blosum62.out Add backtrace false Alignment mode 3 E-value threshold 0.001 Seq. id. threshold 0 Min. alignment length 0 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0 Coverage mode 0 Max sequence length 65535 Compositional bias 1 Realign hits false Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a 1 Pseudo count b 1.5 Score bias 0 Gap open cost 11 Gap extension cost 1 Threads 8 Compressed 0 Verbosity 3 Seed substitution matrix VTML80.out Sensitivity 5.7 K-mer size 0 K-score 2147483647 Alphabet size 21 Max results per query 300 Previous max results per query
Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring 1 Exact k-mer matching 0 Mask residues 1 Mask lower case residues 0 Minimum diagonal score 15 Spaced k-mers 1 Spaced k-mer pattern
Local temporary path
Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Global diagonal rescoring false Mask profile 1 Profile e-value threshold 0.001 Use global sequence weighting false Filter MSA 1 Maximum seq. id. threshold 0.9 Minimum seq. id. 0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Omit consensus false Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Use all table starts false Offset of numeric ids 0 Add orf stop false Chain overlapping alignments 0 Merge query 1 Search type 0 Number search iterations 1 Start sensitivity 4 Search steps 1 Run a seq-profile search in slice mode false Strand selection 1 Disk space limit 0 MPI runner
Force restart with latest tmp false Remove temporary files true Alignment format 0 Format alignment output query,target,pident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits Database output false Overlap 0 Split seq. by length true Database type 0 Do not shuffle input database true Greedy best hits false

Invalid input database and --search-type combination queryDbType: Aminoacid targetDbType: Profile targetSrcDbType: Profile searchMode: 0 Error: Search died

Context

Providing context helps us come up with a solution and improve our documentation for the future. I should stop updating mmseqs

Your Environment

Include as many relevant details about the environment you experienced the bug in.

gaboentropy commented 5 years ago

I downloaded the latest official release (MMseqs2 Release 8-fac81) and everything works with that one.

martin-steinegger commented 5 years ago

Thank you for reporting this and sorry for the inconvenience. This bug got introduced by me. I have reworked the way how MMseqs2 infers the search type since more and more modes have been added and we had some bugs. Currently we have over 32 search possibilities but our test suite does not cover all possibilities. I added a test for your case protein/index profile search.

gaboentropy commented 5 years ago

Thanks.