I ran this script with mmseqs2 part shown below and had a prefilter died error. What should I do?
!/bin/bash
## specify allocation - we want normal since we don't want to use the whole node for nothing
SBATCH -A grp-org-sc
SBATCH -q normal
## specify number of nodes
SBATCH -N 2
## specify number of procs/CPUS
SBATCH -c 8
## specify runtime
SBATCH -t 72:00:00
## specify job name
SBATCH -J seqdetect
##Memory per cpu
SBATCH --mem-per-cpu=512G
export PATH=$PATH:/groups/science/homes/username/anaconda3/bin/mmseqs
[Initial part of the script for pre-processing abbreviated here]
### MMseqs2
Query database size: 38880 type: Profile
Estimated memory consumption: 488M
Target database size: 125 type: Aminoacid
Index table k-mer threshold: 0 at k-mer size 6
Index table: counting k-mers
[=================================================================] 125 0s 5ms
Index table: Masked residues: 124
Index table: fill
[=================================================================] 125 0s 6ms
Index statistics
Entries: 25103
DB size: 488 MB
Avg k-mer size: 0.000392
Top 10 k-mers
ALGLAA 2
TTGTAA 2
AAARKA 2
KASRKA 2
TEEALA 2
EDLLRA 2
INGNED 2
ASARED 2
GKHHRD 2
AELKAE 2
Time for index table init: 0h 0m 0s 511ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 91
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 38880
Target db start 1 to 125
[=mmseqs_target_seq/[bacteria_of_interest]/tmp/15822818178659183495/blastp.sh: line 99: 1649148 Killed $RUNNER "$MMSEQS" prefilter "$INPUT" "$TARGET" "$TMPPATH/pref$STEP" $PREFILTER_PAR -s "$SENS"
Error: Prefilter died
createtsv phrogs_mmseqs_db/phrogs_profile_db mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins.target_seq mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv --full-header
MMseqs Version: 14.7e284
First sequence as representative false
Target column 1
Add full header true
Sequence source 0
Database output false
Threads 64
Compressed 0
Verbosity 3
No datafile could be found for mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs!
cp: cannot stat 'mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv': No such file or directory
file: mmseqs_target_seq/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv
sample: [bacteria_of_interest]
[bacteria_of_interest]
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=4226926.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
MMseqs Output (for bugs)
Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
Expected Behavior
I ran this script with mmseqs2 part shown below and had a prefilter died error. What should I do?
!/bin/bash
## specify allocation - we want normal since we don't want to use the whole node for nothing SBATCH -A grp-org-sc SBATCH -q normal ## specify number of nodes SBATCH -N 2 ## specify number of procs/CPUS SBATCH -c 8 ## specify runtime SBATCH -t 72:00:00 ## specify job name SBATCH -J seqdetect ##Memory per cpu SBATCH --mem-per-cpu=512G
export PATH=$PATH:/groups/science/homes/username/anaconda3/bin/mmseqs [Initial part of the script for pre-processing abbreviated here] ### MMseqs2
conda activate /groups/science/homes/username/.micromamba/envs/mmseqs2
export PATH=$PATH:/groups/science/homes/username/anaconda3/bin/mmseqs mkdir mmseqs_target_seq/ mkdir mmseqs_target_seq/${sample} mkdir phrog_output/ cp previousstep_output/${sample}/${sample}_summary/${sample}_targetofinterest_proteins.faa mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.faa mmseqs createdb mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.faa mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.target_seq
### MMseqs2/Phrogs mmseqs search phrogs_mmseqs_db/phrogs_profile_db \ mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.target_seq \ mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs \ mmseqs_target_seq/${sample}/tmp -s 7
mmseqs createtsv phrogs_mmseqs_db/phrogs_profile_db \ mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.target_seq \ mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs \ mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs.tsv --full-header
cp mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs.tsv mmseqs_target_seq echo "file: mmseqs_target_seq/${sample}_targetofinterest_proteins_mmseqs.tsv"
Current Behavior
[Previous output omitted here] Create directory mmseqs_target_seq/[bacteria_of_interest]/tmp search phrogs_mmseqs_db/phrogs_profile_db mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins.target_seq mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs mmseqs_target_seq/[bacteria_of_interest]/tmp -s 7
MMseqs Version: 14.7e284 Substitution matrix aa:blosum62.out,nucl:nucleotide.out Add backtrace false Alignment mode 2 Alignment mode 0 Allow wrapped scoring false E-value threshold 0.001 Seq. id. threshold 0 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0 Coverage mode 0 Max sequence length 65535 Compositional bias 1 Compositional bias 1 Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a substitution:1.100,context:1.400 Pseudo count b substitution:4.100,context:5.800 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Correlation score weight 0 Gap open cost aa:11,nucl:5 Gap extension cost aa:1,nucl:2 Zdrop 40 Threads 64 Compressed 0 Verbosity 3 Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out Sensitivity 7 k-mer length 0 k-score seq:2147483647,prof:2147483647 Alphabet size aa:21,nucl:5 Max results per query 300 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask residues probability 0.9 Mask lower case residues 0 Minimum diagonal score 15 Selected taxa
Spaced k-mers 1 Spaced k-mer pattern
Local temporary path
Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile E-value threshold 0.1 Global sequence weighting false Allow deletions false Filter MSA 1 Use filter only at N seqs 0 Maximum seq. id. threshold 0.9 Minimum seq. id. 0.0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Pseudo count mode 0 Gap pseudo count 10 Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Chain overlapping alignments 0 Merge query 1 Search type 0 Search iterations 1 Start sensitivity 4 Search steps 1 Exhaustive search mode false Filter results during exhaustive search 0 Strand selection 1 LCA search mode false Disk space limit 0 MPI runner
Force restart with latest tmp false Remove temporary files false
prefilter phrogs_mmseqs_db/phrogs_profile_db mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins.target_seq mmseqs_target_seq/[bacteria_of_interest]/tmp/15822818178659183495/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 64 --compressed 0 -v 3 -s 7.0
Query database size: 38880 type: Profile Estimated memory consumption: 488M Target database size: 125 type: Aminoacid Index table k-mer threshold: 0 at k-mer size 6 Index table: counting k-mers [=================================================================] 125 0s 5ms Index table: Masked residues: 124 Index table: fill [=================================================================] 125 0s 6ms Index statistics Entries: 25103 DB size: 488 MB Avg k-mer size: 0.000392 Top 10 k-mers ALGLAA 2 TTGTAA 2 AAARKA 2 KASRKA 2 TEEALA 2 EDLLRA 2 INGNED 2 ASARED 2 GKHHRD 2 AELKAE 2 Time for index table init: 0h 0m 0s 511ms Process prefiltering step 1 of 1
k-mer similarity threshold: 91 Starting prefiltering scores calculation (step 1 of 1) Query db start 1 to 38880 Target db start 1 to 125 [=mmseqs_target_seq/[bacteria_of_interest]/tmp/15822818178659183495/blastp.sh: line 99: 1649148 Killed $RUNNER "$MMSEQS" prefilter "$INPUT" "$TARGET" "$TMPPATH/pref$STEP" $PREFILTER_PAR -s "$SENS" Error: Prefilter died createtsv phrogs_mmseqs_db/phrogs_profile_db mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins.target_seq mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv --full-header
MMseqs Version: 14.7e284 First sequence as representative false Target column 1 Add full header true Sequence source 0 Database output false Threads 64 Compressed 0 Verbosity 3
No datafile could be found for mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs! cp: cannot stat 'mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv': No such file or directory file: mmseqs_target_seq/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv sample: [bacteria_of_interest] [bacteria_of_interest] slurmstepd: error: Detected 1 oom-kill event(s) in StepId=4226926.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
MMseqs Output (for bugs)
Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.