Closed ruthalee closed 4 months ago
Killed
sounds like the out-of-memory killer of the the operating system killed the process for using too much RAM.
What's odd is that the input set size is really small (985
) and that the alignment is usually not the step to cause issues.
Are there any absurdly long proteins in the set?
Thank you! That was the problem. I did have a couple of proteins ~ 2300 aa long. Fortunately I am on an HPC cluster and can just increase the RAM usage. I appreciate your help!
@ruthalee is this resolved?
@martin-steinegger yes, thank you
Hello, I have been using foldseek to cluster pdbs folded with alphafold and it had been working perfectly. Now I am getting an error trying to cluster. Here is the code. I have left out all of the [=====> for brevity.
(foldseek) me@ahl03 [~/packages/fold_seek] % foldseek easy-cluster Bins_Geo_Shew_ranked_0_pdbs Bins_Geo_Shew tmp -c 0.8 --cov-mode 0
Create directory tmp easy-cluster Bins_Geo_Shew_ranked_0_pdbs Bins_Geo_Shew tmp -c 0.8 --cov-mode 0
MMseqs Version: 8.ef4e960 Substitution matrix aa:3di.out,nucl:3di.out Seed substitution matrix aa:3di.out,nucl:3di.out Sensitivity 4 k-mer length 0 Target search mode 0 k-score seq:2147483647,prof:2147483647 Max sequence length 65535 Max results per query 300 Split database 0 Split mode 2 Split memory limit 0 Coverage threshold 0.8 Coverage mode 0 Compositional bias 1 Compositional bias 1 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask residues probability 0.9 Mask lower case residues 1 Minimum diagonal score 30 Selected taxa
Spaced k-mers 1 Preload mode 0 Spaced k-mer pattern
Local temporary path
Threads 256 Compressed 0 Verbosity 3 TMscore threshold 0 LDDT threshold 0 Sort by structure bit score 1 Alignment type 2 Add backtrace false Alignment mode 0 Alignment mode 0 E-value threshold 10 Seq. id. threshold 0 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Max reject 2147483647 Max accept 2147483647 Gap open cost aa:10,nucl:10 Gap extension cost aa:1,nucl:1 TMalign hit order 0 TMalign fast 1 Cluster mode 0 Max connected component depth 1000 Similarity type 2 Weight file name
Cluster Weight threshold 0.9 Single step clustering false Cascaded clustering steps 3 Cluster reassign false Remove temporary files true Force restart with latest tmp false MPI runner
k-mers per sequence 21 Scale k-mers per sequence aa:0.000,nucl:0.200 Adjust k-mer length false Shift hash 67 Include only extendable false Skip repeating k-mers false Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Chain name mode 0 Write mapping file 0 Mask b-factor threshold 0 Coord store mode 2 Write lookup file 1 Tar Inclusion Regex . Tar Exclusion Regex ^$ File Inclusion Regex . File Exclusion Regex ^$
createdb Bins_Geo_Shew_ranked_0_pdbs tmp/15597438964095582814/input --chain-name-mode 0 --write-mapping 0 --mask-bfactor-threshold 0 --coord-store-mode 2 --write-lookup 1 --tar-include '.' --tar-exclude '^$' --file-include '.' --file-exclude '^$' --threads 256 -v 3
Output file: tmp/15597438964095582814/input
Time for merging to input_ss: 0h 0m 1s 158ms Time for merging to input_h: 0h 0m 1s 209ms Time for merging to input_ca: 0h 0m 1s 264ms Time for merging to input: 0h 0m 1s 51ms Ignore 0 out of 985. Too short: 0, incorrect: 0, not proteins: 0. Time for processing: 0h 0m 42s 56ms Create directory tmp/15597438964095582814/clu_tmp cluster tmp/15597438964095582814/input tmp/15597438964095582814/clu tmp/15597438964095582814/clu_tmp -c 0.8 --cov-mode 0 --remove-tmp-files 1
Set cluster sensitivity to -s 8.000000 Set cluster mode SET COVER Set cluster iterations to 3 kmermatcher tmp/15597438964095582814/input_ss tmp/15597438964095582814/clu_tmp/11654376807347694794/pref --sub-mat 'aa:3di.out,nucl:3di.out' --alph-size aa:21,nucl:5 --min-seq-id 0 --kmer-per-seq 300 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 1 --cov-mode 0 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 256 --compressed 0 -v 3 --cluster-weight-threshold 0.9
kmermatcher tmp/15597438964095582814/input_ss tmp/15597438964095582814/clu_tmp/11654376807347694794/pref --sub-mat 'aa:3di.out,nucl:3di.out' --alph-size aa:21,nucl:5 --min-seq-id 0 --kmer-per-seq 300 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 1 --cov-mode 0 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 256 --compressed 0 -v 3 --cluster-weight-threshold 0.9
Database size: 985 type: Aminoacid Reduced amino acid alphabet: (A F) (C V) (D B) (E Z) (G H) (I M T) (K W) (L J) (N R S) (P) (Q) (Y) (X)
Generate k-mers list for 1 split
Sort kmer 0h 0m 14s 954ms Sort by rep. sequence 0h 0m 1s 18ms Time for fill: 0h 0m 0s 2ms Time for merging to pref: 0h 0m 0s 5ms Time for processing: 0h 0m 26s 949ms structurerescorediagonal tmp/15597438964095582814/input tmp/15597438964095582814/input tmp/15597438964095582814/clu_tmp/11654376807347694794/pref tmp/15597438964095582814/clu_tmp/11654376807347694794/pref_rescore1 --tmscore-threshold 0 --lddt-threshold 0 --alignment-type 2 --sub-mat 'aa:3di.out,nucl:3di.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.01 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 0 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --zdrop 40 --threads 256 --compressed 0 -v 3
Time for merging to pref_rescore1: 0h 0m 0s 140ms Time for processing: 0h 0m 10s 736ms clust tmp/15597438964095582814/input tmp/15597438964095582814/clu_tmp/11654376807347694794/pref_rescore1 tmp/15597438964095582814/clu_tmp/11654376807347694794/pre_clust --cluster-mode 0 --max-iterations 1000 --similarity-type 2 --threads 256 --compressed 0 -v 3 --cluster-weight-threshold 0.9
Clustering mode: Set Cover
Sort entries Find missing connections Found 3532 new connections. Reconstruct initial order
Add missing connections
Time for read in: 0h 0m 17s 5ms Total time: 0h 0m 23s 840ms
Size of the sequence database: 985 Size of the alignment database: 985 Number of clusters: 442
Writing results 0h 0m 0s 0ms Time for merging to pre_clust: 0h 0m 0s 970ms Time for processing: 0h 0m 24s 861ms createsubdb tmp/15597438964095582814/clu_tmp/11654376807347694794/order_redundancy tmp/15597438964095582814/clu_tmp/11654376807347694794/pref tmp/15597438964095582814/clu_tmp/11654376807347694794/pref_filter1 -v 3 --subdb-mode 1
Time for merging to pref_filter1: 0h 0m 0s 6ms Time for processing: 0h 0m 0s 123ms filterdb tmp/15597438964095582814/clu_tmp/11654376807347694794/pref_filter1 tmp/15597438964095582814/clu_tmp/11654376807347694794/pref_filter2 --filter-file tmp/15597438964095582814/clu_tmp/11654376807347694794/order_redundancy --threads 256 --compressed 0 -v 3
Filtering using file(s)
Time for merging to pref_filter2: 0h 0m 0s 148ms Time for processing: 0h 0m 8s 119ms structurealign tmp/15597438964095582814/input tmp/15597438964095582814/input tmp/15597438964095582814/clu_tmp/11654376807347694794/pref_filter2 tmp/15597438964095582814/clu_tmp/11654376807347694794/aln.linclust --tmscore-threshold 0 --lddt-threshold 0 --sort-by-structure-bits 0 --alignment-type 2 --sub-mat 'aa:3di.out,nucl:3di.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.01 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 0 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --zdrop 40 --threads 256 --compressed 0 -v 3
tmp/15597438964095582814/clu_tmp/11654376807347694794/clustering.sh: line 123: 2986178 Killed $RUNNER "$MMSEQS" $ALIGNMENT_ALGO "${INPUT}${ALN_EXTENSION}" "${INPUT}${ALN_EXTENSION}" "${TMP_PATH}/pref_filter2" "${TMP_PATH}/aln.linclust" ${ALIGNMENT_PAR} Error: Alignment step died Error: Search died
I looked up the clustering.sh line:
4. Clustering using greedy set cover.
if notExists "${TMP_PATH}/clust.linclust.dbtype"; then
shellcheck disable=SC2086,SC2153
fi
if notExists "${TMP_PATH}/clu_redundancy.dbtype"; then
shellcheck disable=SC2086
fi fi <----- line 123
I installed foldseek with mamba into its own environment on a linux x64 system. After this problem I ran foldseek on a pdb set I had run before and it did not work. I uninstalled and reinstalled foldseek in the off chance something weird happened in my environment, but it did not fix the problem. Any idea what is happening? Thanks so much!