soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.37k stars 193 forks source link

Error: Prefilter died #826

Open goldenmole1 opened 6 months ago

goldenmole1 commented 6 months ago

Expected Behavior

I ran this script with mmseqs2 part shown below and had a prefilter died error. What should I do?

!/bin/bash

## specify allocation - we want normal since we don't want to use the whole node for nothing SBATCH -A grp-org-sc SBATCH -q normal ## specify number of nodes SBATCH -N 2 ## specify number of procs/CPUS SBATCH -c 8 ## specify runtime SBATCH -t 72:00:00 ## specify job name SBATCH -J seqdetect ##Memory per cpu SBATCH --mem-per-cpu=512G

export PATH=$PATH:/groups/science/homes/username/anaconda3/bin/mmseqs [Initial part of the script for pre-processing abbreviated here] ### MMseqs2

conda activate /groups/science/homes/username/.micromamba/envs/mmseqs2

export PATH=$PATH:/groups/science/homes/username/anaconda3/bin/mmseqs mkdir mmseqs_target_seq/ mkdir mmseqs_target_seq/${sample} mkdir phrog_output/ cp previousstep_output/${sample}/${sample}_summary/${sample}_targetofinterest_proteins.faa mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.faa mmseqs createdb mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.faa mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.target_seq

### MMseqs2/Phrogs mmseqs search phrogs_mmseqs_db/phrogs_profile_db \ mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.target_seq \ mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs \ mmseqs_target_seq/${sample}/tmp -s 7

mmseqs createtsv phrogs_mmseqs_db/phrogs_profile_db \ mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.target_seq \ mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs \ mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs.tsv --full-header

cp mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs.tsv mmseqs_target_seq echo "file: mmseqs_target_seq/${sample}_targetofinterest_proteins_mmseqs.tsv"

Current Behavior

[Previous output omitted here] Create directory mmseqs_target_seq/[bacteria_of_interest]/tmp search phrogs_mmseqs_db/phrogs_profile_db mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins.target_seq mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs mmseqs_target_seq/[bacteria_of_interest]/tmp -s 7

MMseqs Version: 14.7e284 Substitution matrix aa:blosum62.out,nucl:nucleotide.out Add backtrace false Alignment mode 2 Alignment mode 0 Allow wrapped scoring false E-value threshold 0.001 Seq. id. threshold 0 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0 Coverage mode 0 Max sequence length 65535 Compositional bias 1 Compositional bias 1 Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a substitution:1.100,context:1.400 Pseudo count b substitution:4.100,context:5.800 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Correlation score weight 0 Gap open cost aa:11,nucl:5 Gap extension cost aa:1,nucl:2 Zdrop 40 Threads 64 Compressed 0 Verbosity 3 Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out Sensitivity 7 k-mer length 0 k-score seq:2147483647,prof:2147483647 Alphabet size aa:21,nucl:5 Max results per query 300 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask residues probability 0.9 Mask lower case residues 0 Minimum diagonal score 15 Selected taxa
Spaced k-mers 1 Spaced k-mer pattern
Local temporary path
Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile E-value threshold 0.1 Global sequence weighting false Allow deletions false Filter MSA 1 Use filter only at N seqs 0 Maximum seq. id. threshold 0.9 Minimum seq. id. 0.0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Pseudo count mode 0 Gap pseudo count 10 Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Chain overlapping alignments 0 Merge query 1 Search type 0 Search iterations 1 Start sensitivity 4 Search steps 1 Exhaustive search mode false Filter results during exhaustive search 0 Strand selection 1 LCA search mode false Disk space limit 0 MPI runner
Force restart with latest tmp false Remove temporary files false

prefilter phrogs_mmseqs_db/phrogs_profile_db mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins.target_seq mmseqs_target_seq/[bacteria_of_interest]/tmp/15822818178659183495/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 64 --compressed 0 -v 3 -s 7.0

Query database size: 38880 type: Profile Estimated memory consumption: 488M Target database size: 125 type: Aminoacid Index table k-mer threshold: 0 at k-mer size 6 Index table: counting k-mers [=================================================================] 125 0s 5ms Index table: Masked residues: 124 Index table: fill [=================================================================] 125 0s 6ms Index statistics Entries: 25103 DB size: 488 MB Avg k-mer size: 0.000392 Top 10 k-mers ALGLAA 2 TTGTAA 2 AAARKA 2 KASRKA 2 TEEALA 2 EDLLRA 2 INGNED 2 ASARED 2 GKHHRD 2 AELKAE 2 Time for index table init: 0h 0m 0s 511ms Process prefiltering step 1 of 1

k-mer similarity threshold: 91 Starting prefiltering scores calculation (step 1 of 1) Query db start 1 to 38880 Target db start 1 to 125 [=mmseqs_target_seq/[bacteria_of_interest]/tmp/15822818178659183495/blastp.sh: line 99: 1649148 Killed $RUNNER "$MMSEQS" prefilter "$INPUT" "$TARGET" "$TMPPATH/pref$STEP" $PREFILTER_PAR -s "$SENS" Error: Prefilter died createtsv phrogs_mmseqs_db/phrogs_profile_db mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins.target_seq mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv --full-header

MMseqs Version: 14.7e284 First sequence as representative false Target column 1 Add full header true Sequence source 0 Database output false Threads 64 Compressed 0 Verbosity 3

No datafile could be found for mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs! cp: cannot stat 'mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv': No such file or directory file: mmseqs_target_seq/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv sample: [bacteria_of_interest] [bacteria_of_interest] slurmstepd: error: Detected 1 oom-kill event(s) in StepId=4226926.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

MMseqs Output (for bugs)

Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

milot-mirdita commented 6 months ago

Please try to reverse the search direction (sequences vs profiles, not profiles vs sequences).

It looks like the small number of queries is causing some weird issue.