Closed mickeykawai closed 1 year ago
Now that I understand what $MMSEQS
in easypredict.sh
means, after seeing other thread of issues, I found extractorfs
is called with --threads 192
, i.e. --threads <n_proc_of_the_server>
.
$ export MMSEQS=<topdir_metaeuk>/build/bin/metaeuk
$ ls
contigs contigs_h.dbtype contigs.lookup targets targets_h.dbtype targets.source
contigs.dbtype contigs_h.index contigs.source targets.dbtype targets_h.index
contigs_h contigs.index easypredict.sh targets_h targets.index
$ sh easypredict.sh contigs targets predictionsFasta tmp
tmp directory tmp not found!
Create directory tmp/tmp_predict
predictexons contigs targets tmp/MetaEuk_calls tmp/tmp_predict
MMseqs Version: e8ef9c146b871a86415c3b74d1db1a4e24026158
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 100
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 192
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 4
k-mer length 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Min codons in orf 15
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false
maximal combined evalue of an optimal set 0.001
minimal length ratio between combined optimal set and target 0.5
Maximal intron length 10000
Minimal intron length 15
Minimal exon length aa 11
Maximal overlap of exons 10
Maximal number of exon sets 1
Gap open penalty -1
Gap extend penalty -1
Reverse AA Fragments 0
extractorfs <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/contigs <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/12404560893930422762/nucl_6f --min-length 15 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 0 --use-all-table-starts 0 --id-offset 0 --create-lookup 0 --threads 192 --compressed 0 -v 3
libgomp: Thread creation failed: Resource temporarily unavailable
Error: extractorfs step died
Error: predictexons step died
$
"$MMSEQS" predictexons
in easypredict.sh
should be called with --threads <n_thread>
, but not set (so use
easypredict.sh
without --threads 1
, it fails with libgomp
error.
$ "$MMSEQS" predictexons contigs targets tmp/MetaEuk_calls tmp/tmp_predict
predictexons contigs targets tmp/MetaEuk_calls tmp/tmp_predict
MMseqs Version: e8ef9c146b871a86415c3b74d1db1a4e24026158
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 100
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 192
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 4
k-mer length 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Min codons in orf 15
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false
maximal combined evalue of an optimal set 0.001
minimal length ratio between combined optimal set and target 0.5
Maximal intron length 10000
Minimal intron length 15
Minimal exon length aa 11
Maximal overlap of exons 10
Maximal number of exon sets 1
Gap open penalty -1
Gap extend penalty -1
Reverse AA Fragments 0
extractorfs
libgomp: Thread creation failed: Resource temporarily unavailable Error: extractorfs step died $
- With `--threads 1`, it runs.
$ "$MMSEQS" predictexons --threads 1 contigs targets tmp/MetaEuk_calls tmp/tmp_predict predictexons --threads 1 contigs targets tmp/MetaEuk_calls tmp/tmp_predict
MMseqs Version: e8ef9c146b871a86415c3b74d1db1a4e24026158
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 100
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 1
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 4
k-mer length 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Min codons in orf 15
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false
maximal combined evalue of an optimal set 0.001
minimal length ratio between combined optimal set and target 0.5
Maximal intron length 10000
Minimal intron length 15
Minimal exon length aa 11
Maximal overlap of exons 10
Maximal number of exon sets 1
Gap open penalty -1
Gap extend penalty -1
Reverse AA Fragments 0
extractorfs
[=================================================================] 100.00% 30.05K 1s 409ms
Time for merging to nucl_6f_h: 0h 0m 0s 207ms
Time for merging to nucl_6f: 0h 0m 0s 207ms
Time for processing: 0h 0m 2s 807ms
translatenucs
[=================================================================] 100.00% 908.41K 0s 731ms
Time for merging to aa_6f: 0h 0m 0s 210ms
Time for processing: 0h 0m 1s 327ms
Create directory
prefilter
Query database size: 908411 type: Aminoacid
Estimated memory consumption: 1G
Target database size: 302957 type: Aminoacid
Index table k-mer threshold: 127 at k-mer size 6
Index table: counting k-mers
[=================================================================] 100.00% 302.96K 40s 94ms
Index table: Masked residues: 2249117
Index table: fill
[=================================================================] 100.00% 302.96K 52s 604ms
Index statistics
Entries: 145543146
DB size: 1321 MB
Avg k-mer size: 2.274112
Top 10 k-mers
GSDTLW 1615
ASDTLW 1346
SGATSL 1194
FTGTNN 1191
GGQRRS 1182
TSSEYV 1171
CTALSY 1157
EQIRAT 1133
LREGLY 1129
FEDPAM 1111
Time for index table init: 0h 1m 34s 520ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 127 Starting prefiltering scores calculation (step 1 of 1) Query db start 1 to 908411 Target db start 1 to 302957 [=================================================================] 100.00% 908.41K 2m 45s 733ms
47.822211 k-mers per position
2686 DB matches per sequence
0 overflows
13 sequences passed prefiltering per query sequence
5 median result list length
162157 sequences with 0 size result lists
Time for merging to pref_0: 0h 0m 0s 246ms
Time for processing: 0h 4m 23s 584ms
align
Can you try to set the MMSEQS_NUM_THREADS env variable? That should fix your issue as a workaround (hopefully). We'll have to investigate where we are spawning too many threads.
Sorr, I missed that you already identified where it goes wrong. We'll take a look. In the meantime the env variable I mentioned should fix the issue.
Thank you very much for the prompt direction! Fixed for both metaeuk easy-predict --threads 1 ...
and called from busco --cpu 4 ...
.
$ export MMSEQS_NUM_THREADS=4
$ export OPENBLAS_NUM_THREADS=4 # for busco
$ busco -m transcriptome -i trinity_out_dir.Trinity.fasta --cpu 4 -o busco_eukaryota_odb10 -l eukaryota_odb10
...
2023-06-20 22:00:57 INFO: BUSCO analysis done. Total running time: 853 seconds
2023-06-20 22:00:57 INFO: Results written in <topdir>/busco_eukaryota_odb10
2023-06-20 22:00:57 INFO: For assistance with interpreting the results, please consult the userguide: https://busco.ezlab.org/busco_userguide.html
2023-06-20 22:00:57 INFO: Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO
$
Expected Behavior
metaeuk easy-predict --threads 1 ...
command as below does not return error.Current Behavior
metaeuk easy-predict --threads 1 ...
as above returns following error.I met the error on an account which has linux's
nproc
limit, but succeeded on another account withoutnproc
limit. Thebusco
command tried isbusco -m transcriptome -i trinity_out_dir.Trinity.fasta --cpu 1 -o busco -l eukaryota_odb10
.At the
<busco_outname>/logs
dir, you'll see error message inmetaeuk_run1_err.log
, which says:libgomp: Thread creation failed: Resource temporarily unavailable
.export OPENBLAS_NUM_THREADS=1
(to restrict n_threads ofbusco
) andexport OMP_NUM_THREADS=1
(to restrict n_threads ofmetaeuk
) did not solve the problem.Metaeuk fails at the very beginning, maybe at some point during preparation of amino-acid sequences from nucleotide sequences. Not only from
busco
but separately installedmetaeuk
, either precompiled or compiled from source, fails with the same message.ulimit -s 999999
did not solve the error either.Metaeuk fails at the step of
EasyPredict.cpp
where it callseasypredict.sh
it just created ineasypredict
. I suspect the linecmd.addVariable("THREAD_COMP_PAR", par.createParameterString(par.threadsandcompression).c_str());
might overwrite the given OMP_NUM_THREADS, but I couldn't solve further as I'm not familiar with c++.. Asmetaeuk easy-predict -h
says--threads INT Number of CPU-cores used (all by default) [<n_core_of_the_server>]
, I guess at somewheremetaeuk easy-predict
forgets given n_threads--threads 1
and assume it can use all cpus..The log of
metaeuk easy-predict
is as follows.Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
nproc
limit is set by admin of the server of nproc 192.MetaEuk Output (for bugs)
Please make sure to also post the complete output of MetaEuk. You can use gist.github.com for large output.
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Please refer to https://gitlab.com/ezlab/busco/-/issues/675#note_1438785434
Your Environment
Include as many relevant details about the environment you experienced the bug in.
Both statistically-compiled and self-compiled (with gcc/8.3.0) .