Open mewu3 opened 1 year ago
I am not sure if <()
will work. Please try zcat file | mmseqs easy-search stdin …
instead.
Also mmseqs can read gzip files directly, you shouldn’t need either piping or <()
zcat the gzipped fq file before execute mmseqs easy-search change nothing, the error remains the same
easy-search B5177-2-N919D_T1_interleaved.fq pfama_20230914/pfam-a-full B5177-2-N919D_T1.tsv tmp1 --threads 12 --compressed 1
MMseqs Version: 13.45111
Substitution matrix nucl:nucleotide.out,aa:blosum62.out
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Zdrop 40
Threads 12
Compressed 1
Verbosity 3
Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out
Sensitivity 5.7
k-mer length 0
k-score 2147483647
Alphabet size nucl:5,aa:21
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask lower case residues 0
Minimum diagonal score 15
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Maximum seq. id. threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files true
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Overlap threshold 0
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 0
Greedy best hits false
createdb B5177-2-N919D_T1_interleaved.fq tmp1/15089590555632328148/query --dbtype 0 --shuffle 1 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 1 -v 3
Converting sequences
[10000] 0s 75ms
Time for merging to query_h: 0h 0m 0s 1ms
Time for merging to query: 0h 0m 0s 1ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 87ms
Create directory tmp1/15089590555632328148/search_tmp
search tmp1/15089590555632328148/query pfama_20230914/pfam-a-full tmp1/15089590555632328148/result tmp1/15089590555632328148/search_tmp --alignment-mode 3 --threads 12 --compressed 1 -s 5.7 --remove-tmp-files 1
extractorfs tmp1/15089590555632328148/query tmp1/15089590555632328148/search_tmp/16020913209305565279/q_orfs_aa --min-length 30 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 1 --use-all-table-starts 0 --id-offset 0 --create-lookup 0 --threads 12 --compressed 1 -v 3
[=================================================================] 100.00% 10.00K 0s 19ms
Time for merging to q_orfs_aa_h: 0h 0m 0s 0ms
Time for merging to q_orfs_aa: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 27ms
prefilter tmp1/15089590555632328148/search_tmp/16020913209305565279/q_orfs_aa pfama_20230914/pfam-a-full tmp1/15089590555632328148/search_tmp/16020913209305565279/search/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -s 5.7 -k 5 --k-score 2147483647 --alph-size nucl:5,aa:21 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 12 --compressed 1 -v 3
Query database size: 0 type: Aminoacid
Estimated memory consumption: 557M
Target database size: 19482 type: Profile
Index table k-mer threshold: 90 at k-mer size 5
Index table: counting k-mers
[=================================================================] 100.00% 19.48K 4s 492ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 100.00% 19.48K 11s 88ms
Index statistics
Entries: 454766207
DB size: 2633 MB
Avg k-mer size: 111.350382
Top 10 k-mers
RRRRR 1414
QQQQQ 1135
RRRRQ 985
SPPPP 965
QRRRR 962
PPPPS 953
RQRRR 940
RRRQR 927
RHRRR 914
RKRRR 902
Time for index table init: 0h 0m 16s 838ms
Time for processing: 0h 0m 17s 228ms
swapresults tmp1/15089590555632328148/search_tmp/16020913209305565279/q_orfs_aa pfama_20230914/pfam-a-full tmp1/15089590555632328148/search_tmp/16020913209305565279/search/pref tmp1/15089590555632328148/search_tmp/16020913209305565279/search/pref_swapped --sub-mat nucl:nucleotide.out,aa:blosum62.out -e 0.001 --split-memory-limit 0 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --threads 12 --compressed 1 --db-load-mode 0 -v 3
Input tmp1/15089590555632328148/search_tmp/16020913209305565279/search/pref does not exist
Error: Swapresults pref died
Error: Search step died
Error: Search died
What's the length of the reads? We demand a minimum length for extracted ORFs of 30 AA, you can adjust this with the --min-length
. Currently its extracting 0 ORFs and then crashing afterwards, since there are no query sequences.
these are 50bp nuc sequence, and yes you are right, setting the --min-length
to 10 i get the results, thanks a lot
and i got another question, i'm confused with the search-type
option of easy-search, i don't know the correspondence with the number. please tell if the following guessing is correct.
0: auto
1: amino acid, BLASTP
2: translated, BLASTX
3: nucleotide, -> BLASTN
4: translated nucleotide alignment -> TBLASTX
This is a translated search against a protein (profile) db, so this corresponds to blastx/2. (kind of I don’t think there is a actually translated profile search mode in blast). This is closer to something like a translated hmmer search.
Expected Behavior
nuc reads would align to aa DB
Current Behavior
Input tmp1/15265602719800494645/search_tmp/18256837363344165166/search/pref does not exist
Steps to Reproduce (for bugs)
B5177-2-N919D_T1_interleaved.fq.gz DB_TARGET_PATH_NAME is the pfam DB downloaded with mmseqs
MMseqs Output (for bugs)
No output
Context
i try another fasta file, things went well. But with the uploaded file as input, i get the error said
Input tmp1/15265602719800494645/search_tmp/18256837363344165166/search/pref does not exist
, could not figure why.Your Environment
mmseqs version: 13.45111 conda installed Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-153-generic x86_64)