soedinglab / metaeuk

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics
GNU General Public License v3.0
178 stars 23 forks source link

`export OMP_NUM_THREADS=1; metaeuk easy-predict --threads 1` still fails with `libgomp: Thread creation failed: Resource temporarily unavailable` for users where linux's `nproc` limit is set #80

Closed mickeykawai closed 1 year ago

mickeykawai commented 1 year ago

Expected Behavior

metaeuk easy-predict --threads 1 ... command as below does not return error.

$ metaeuk easy-predict --threads 1 <topdir>/trinity_out_dir.Trinity.fasta <topdir>/busco_downloads/lineages/eukaryota_odb10/refseq_db.faa <topdir>/busco/run_eukaryota_odb10/metaeuk_output/initial_results/trinity_out_dir.Trinity.fasta <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp --max-intron 130000 --max-seq-len 160000 --min-exon-aa 15 --max-overlap 15 --min-intron 5 --overlap 1 -s 4.5 -v 3 --mpi-runner 0
easy-predict --threads 1 <topdir>/trinity_out_dir.Trinity.fasta <topdir>/busco_downloads/lineages/eukaryota_odb10/refseq_db.faa <topdir>/busco/run_eukaryota_odb10/metaeuk_output/initial_results/trinity_out_dir.Trinity.fasta <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp --max-intron 130000 --max-seq-len 160000 --min-exon-aa 15 --max-overlap 15 --min-intron 5 --overlap 1 -s 4.5
$ ls busco/logs/
busco.log          hmmsearch_out.log     metaeuk_run1_out.log  metaeuk_run2_out.log
hmmsearch_err.log  metaeuk_run1_err.log  metaeuk_run2_err.log

Current Behavior

metaeuk easy-predict --threads 1 ... as above returns following error.

...

Converting sequences
[29998] 0s 216ms
Time for merging to contigs_h: 0h 0m 0s 131ms
Time for merging to contigs: 0h 0m 0s 265ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 961ms
<topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/targets exists and will be overwritten
createdb <topdir>/busco_downloads/lineages/eukaryota_odb10/refseq_db.faa <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/targets --dbtype 1 --compressed 0 -v 3 

Converting sequences
[302900] 0s 756ms
Time for merging to targets_h: 0h 0m 0s 258ms
Time for merging to targets: 0h 0m 1s 303ms
Database type: Aminoacid

libgomp: Thread creation failed: Resource temporarily unavailable
Error: targets createdb died
$ 

I met the error on an account which has linux's nproc limit, but succeeded on another account without nproc limit. The busco command tried is busco -m transcriptome -i trinity_out_dir.Trinity.fasta --cpu 1 -o busco -l eukaryota_odb10.

At the <busco_outname>/logs dir, you'll see error message in metaeuk_run1_err.log, which says: libgomp: Thread creation failed: Resource temporarily unavailable.

$ ls
busco.log  metaeuk_run1_err.log  metaeuk_run1_out.log

export OPENBLAS_NUM_THREADS=1 (to restrict n_threads of busco) and export OMP_NUM_THREADS=1 (to restrict n_threads of metaeuk) did not solve the problem.

Metaeuk fails at the very beginning, maybe at some point during preparation of amino-acid sequences from nucleotide sequences. Not only from busco but separately installed metaeuk, either precompiled or compiled from source, fails with the same message.

ulimit -s 999999 did not solve the error either.

Metaeuk fails at the step of EasyPredict.cpp where it calls easypredict.sh it just created in easypredict. I suspect the line cmd.addVariable("THREAD_COMP_PAR", par.createParameterString(par.threadsandcompression).c_str()); might overwrite the given OMP_NUM_THREADS, but I couldn't solve further as I'm not familiar with c++.. As metaeuk easy-predict -h says --threads INT Number of CPU-cores used (all by default) [<n_core_of_the_server>], I guess at somewhere metaeuk easy-predict forgets given n_threads --threads 1 and assume it can use all cpus..

The log of metaeuk easy-predict is as follows.

$ metaeuk easy-predict --threads 1 <topdir>/trinity_out_dir.Trinity.fasta <topdir>/busco_downloads/lineages/eukaryota_odb10/refseq_db.faa <topdir>/busco/run_eukaryota_odb10/metaeuk_output/initial_results/trinity_out_dir.Trinity.fasta <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp --max-intron 130000 --max-seq-len 160000 --min-exon-aa 15 --max-overlap 15 --min-intron 5 --overlap 1 -s 4.5 -v 3 --mpi-runner 0
easy-predict --threads 1 <topdir>/trinity_out_dir.Trinity.fasta <topdir>/busco_downloads/lineages/eukaryota_odb10/refseq_db.faa <topdir>/busco/run_eukaryota_odb10/metaeuk_output/initial_results/trinity_out_dir.Trinity.fasta <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp --max-intron 130000 --max-seq-len 160000 --min-exon-aa 15 --max-overlap 15 --min-intron 5 --overlap 1 -s 4.5 -v 3 --mpi-runner 0 

MMseqs Version:                                                 e8ef9c146b871a86415c3b74d1db1a4e24026158
Substitution matrix                                             aa:blosum62.out,nucl:nucleotide.out
Add backtrace                                                   false
Alignment mode                                                  2
Alignment mode                                                  0
Allow wrapped scoring                                           false
E-value threshold                                               100
Seq. id. threshold                                              0
Min alignment length                                            0
Seq. id. mode                                                   0
Alternative alignments                                          0
Coverage threshold                                              0
Coverage mode                                                   0
Max sequence length                                             160000
Compositional bias                                              1
Compositional bias                                              1
Max reject                                                      2147483647
Max accept                                                      2147483647
Include identical seq. id.                                      false
Preload mode                                                    0
Pseudo count a                                                  substitution:1.100,context:1.400
Pseudo count b                                                  substitution:4.100,context:5.800
Score bias                                                      0
Realign hits                                                    false
Realign score bias                                              -0.2
Realign max seqs                                                2147483647
Correlation score weight                                        0
Gap open cost                                                   aa:11,nucl:5
Gap extension cost                                              aa:1,nucl:2
Zdrop                                                           40
Threads                                                         1
Compressed                                                      0
Verbosity                                                       3
Seed substitution matrix                                        aa:VTML80.out,nucl:nucleotide.out
Sensitivity                                                     4.5
k-mer length                                                    0
k-score                                                         seq:2147483647,prof:2147483647
Alphabet size                                                   aa:21,nucl:5
Max results per query                                           300
Split database                                                  0
Split mode                                                      2
Split memory limit                                              0
Diagonal scoring                                                true
Exact k-mer matching                                            0
Mask residues                                                   1
Mask residues probability                                       0.9
Mask lower case residues                                        0
Minimum diagonal score                                          15
Selected taxa                                                   
Spaced k-mers                                                   1
Spaced k-mer pattern                                            
Local temporary path                                            
Rescore mode                                                    0
Remove hits by seq. id. and coverage                            false
Sort results                                                    0
Mask profile                                                    1
Profile E-value threshold                                       0.001
Global sequence weighting                                       false
Allow deletions                                                 false
Filter MSA                                                      1
Use filter only at N seqs                                       0
Maximum seq. id. threshold                                      0.9
Minimum seq. id.                                                0.0
Minimum score per column                                        -20
Minimum coverage                                                0
Select N most diverse seqs                                      1000
Pseudo count mode                                               0
Min codons in orf                                               15
Max codons in length                                            32734
Max orf gaps                                                    2147483647
Contig start mode                                               2
Contig end mode                                                 2
Orf start mode                                                  1
Forward frames                                                  1,2,3
Reverse frames                                                  1,2,3
Translation table                                               1
Translate orf                                                   0
Use all table starts                                            false
Offset of numeric ids                                           0
Create lookup                                                   0
Add orf stop                                                    false
Overlap between sequences                                       0
Sequence split mode                                             1
Header split mode                                               0
Chain overlapping alignments                                    0
Merge query                                                     1
Search type                                                     0
Start sensitivity                                               4
Search steps                                                    1
Exhaustive search mode                                          false
Filter results during exhaustive search                         0
Strand selection                                                1
LCA search mode                                                 false
Disk space limit                                                0
MPI runner                                                      0
Force restart with latest tmp                                   false
Remove temporary files                                          false
maximal combined evalue of an optimal set                       0.001
minimal length ratio between combined optimal set and target    0.5
Maximal intron length                                           130000
Minimal intron length                                           5
Minimal exon length aa                                          15
Maximal overlap of exons                                        15
Maximal number of exon sets                                     1
Gap open penalty                                                -1
Gap extend penalty                                              -1
Reverse AA Fragments                                            0
allow same-strand overlaps                                      1
translate codons to AAs                                         0
write target key instead of accession                           0
write fragment contig coords                                    0

easy-predict --threads 1 <topdir>/trinity_out_dir.Trinity.fasta <topdir>/busco_downloads/lineages/eukaryota_odb10/refseq_db.faa <topdir>/busco/run_eukaryota_odb10/metaeuk_output/initial_results/trinity_out_dir.Trinity.fasta <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp --max-intron 130000 --max-seq-len 160000 --min-exon-aa 15 --max-overlap 15 --min-intron 5 --overlap 1 -s 4.5 -v 3 --mpi-runner 0 

<topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/contigs exists and will be overwritten
createdb <topdir>/trinity_out_dir.Trinity.fasta <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/contigs --dbtype 2 --compressed 0 -v 3 

Converting sequences
[29998] 0s 216ms
Time for merging to contigs_h: 0h 0m 0s 131ms
Time for merging to contigs: 0h 0m 0s 265ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 961ms
<topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/targets exists and will be overwritten
createdb <topdir>/busco_downloads/lineages/eukaryota_odb10/refseq_db.faa <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/targets --dbtype 1 --compressed 0 -v 3 

Converting sequences
[302900] 0s 756ms
Time for merging to targets_h: 0h 0m 0s 258ms
Time for merging to targets: 0h 0m 1s 303ms
Database type: Aminoacid

libgomp: Thread creation failed: Resource temporarily unavailable
Error: targets createdb died
$ 

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

metaeuk easy-predict --threads 1 <topdir>/trinity_out_dir.Trinity.fasta <topdir>/busco_downloads/lineages/eukaryota_odb10/refseq_db.faa <topdir>/busco/run_eukaryota_odb10/metaeuk_output/initial_results/trinity_out_dir.Trinity.fasta <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp --max-intron 130000 --max-seq-len 160000 --min-exon-aa 15 --max-overlap 15 --min-intron 5 --overlap 1 -s 4.5

MetaEuk Output (for bugs)

Please make sure to also post the complete output of MetaEuk. You can use gist.github.com for large output.

$ ls
contigs         contigs_h.dbtype  contigs.lookup  targets         targets_h.dbtype  targets.source
contigs.dbtype  contigs_h.index   contigs.source  targets.dbtype  targets_h.index
contigs_h       contigs.index     easypredict.sh  targets_h       targets.index

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Please refer to https://gitlab.com/ezlab/busco/-/issues/675#note_1438785434

Your Environment

Include as many relevant details about the environment you experienced the bug in.

metaeuk/6.src/build/bin/metaeuk | grep Version
metaeuk Version: e8ef9c146b871a86415c3b74d1db1a4e24026158

Both statistically-compiled and self-compiled (with gcc/8.3.0) .

cmake/3.15.4
gcc/8.3.0
$ cat /proc/cpuinfo | grep sse4_1 | head -n 2
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
$ grep -E '^(VERSION|NAME)=' /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
$ uname -a | grep x86_64
Linux <name_server> 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
mickeykawai commented 1 year ago

Now that I understand what $MMSEQS in easypredict.sh means, after seeing other thread of issues, I found extractorfs is called with --threads 192, i.e. --threads <n_proc_of_the_server>.

$ export MMSEQS=<topdir_metaeuk>/build/bin/metaeuk
$ ls
contigs         contigs_h.dbtype  contigs.lookup  targets         targets_h.dbtype  targets.source
contigs.dbtype  contigs_h.index   contigs.source  targets.dbtype  targets_h.index
contigs_h       contigs.index     easypredict.sh  targets_h       targets.index
$ sh easypredict.sh contigs targets predictionsFasta tmp
tmp directory tmp not found!
Create directory tmp/tmp_predict
predictexons contigs targets tmp/MetaEuk_calls tmp/tmp_predict 

MMseqs Version:                                                 e8ef9c146b871a86415c3b74d1db1a4e24026158
Substitution matrix                                             aa:blosum62.out,nucl:nucleotide.out
Add backtrace                                                   false
Alignment mode                                                  2
Alignment mode                                                  0
Allow wrapped scoring                                           false
E-value threshold                                               100
Seq. id. threshold                                              0
Min alignment length                                            0
Seq. id. mode                                                   0
Alternative alignments                                          0
Coverage threshold                                              0
Coverage mode                                                   0
Max sequence length                                             65535
Compositional bias                                              1
Compositional bias                                              1
Max reject                                                      2147483647
Max accept                                                      2147483647
Include identical seq. id.                                      false
Preload mode                                                    0
Pseudo count a                                                  substitution:1.100,context:1.400
Pseudo count b                                                  substitution:4.100,context:5.800
Score bias                                                      0
Realign hits                                                    false
Realign score bias                                              -0.2
Realign max seqs                                                2147483647
Correlation score weight                                        0
Gap open cost                                                   aa:11,nucl:5
Gap extension cost                                              aa:1,nucl:2
Zdrop                                                           40
Threads                                                         192
Compressed                                                      0
Verbosity                                                       3
Seed substitution matrix                                        aa:VTML80.out,nucl:nucleotide.out
Sensitivity                                                     4
k-mer length                                                    0
k-score                                                         seq:2147483647,prof:2147483647
Alphabet size                                                   aa:21,nucl:5
Max results per query                                           300
Split database                                                  0
Split mode                                                      2
Split memory limit                                              0
Diagonal scoring                                                true
Exact k-mer matching                                            0
Mask residues                                                   1
Mask residues probability                                       0.9
Mask lower case residues                                        0
Minimum diagonal score                                          15
Selected taxa                                                   
Spaced k-mers                                                   1
Spaced k-mer pattern                                            
Local temporary path                                            
Rescore mode                                                    0
Remove hits by seq. id. and coverage                            false
Sort results                                                    0
Mask profile                                                    1
Profile E-value threshold                                       0.001
Global sequence weighting                                       false
Allow deletions                                                 false
Filter MSA                                                      1
Use filter only at N seqs                                       0
Maximum seq. id. threshold                                      0.9
Minimum seq. id.                                                0.0
Minimum score per column                                        -20
Minimum coverage                                                0
Select N most diverse seqs                                      1000
Pseudo count mode                                               0
Min codons in orf                                               15
Max codons in length                                            32734
Max orf gaps                                                    2147483647
Contig start mode                                               2
Contig end mode                                                 2
Orf start mode                                                  1
Forward frames                                                  1,2,3
Reverse frames                                                  1,2,3
Translation table                                               1
Translate orf                                                   0
Use all table starts                                            false
Offset of numeric ids                                           0
Create lookup                                                   0
Add orf stop                                                    false
Overlap between sequences                                       0
Sequence split mode                                             1
Header split mode                                               0
Chain overlapping alignments                                    0
Merge query                                                     1
Search type                                                     0
Start sensitivity                                               4
Search steps                                                    1
Exhaustive search mode                                          false
Filter results during exhaustive search                         0
Strand selection                                                1
LCA search mode                                                 false
Disk space limit                                                0
MPI runner                                                      
Force restart with latest tmp                                   false
Remove temporary files                                          false
maximal combined evalue of an optimal set                       0.001
minimal length ratio between combined optimal set and target    0.5
Maximal intron length                                           10000
Minimal intron length                                           15
Minimal exon length aa                                          11
Maximal overlap of exons                                        10
Maximal number of exon sets                                     1
Gap open penalty                                                -1
Gap extend penalty                                              -1
Reverse AA Fragments                                            0

extractorfs <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/contigs <topdir>/busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/12404560893930422762/nucl_6f --min-length 15 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 0 --use-all-table-starts 0 --id-offset 0 --create-lookup 0 --threads 192 --compressed 0 -v 3 

libgomp: Thread creation failed: Resource temporarily unavailable
Error: extractorfs step died
Error: predictexons step died
$ 
mickeykawai commented 1 year ago

"$MMSEQS" predictexons in easypredict.sh should be called with --threads <n_thread>, but not set (so use ).

MMseqs Version: e8ef9c146b871a86415c3b74d1db1a4e24026158 Substitution matrix aa:blosum62.out,nucl:nucleotide.out Add backtrace false Alignment mode 2 Alignment mode 0 Allow wrapped scoring false E-value threshold 100 Seq. id. threshold 0 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0 Coverage mode 0 Max sequence length 65535 Compositional bias 1 Compositional bias 1 Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a substitution:1.100,context:1.400 Pseudo count b substitution:4.100,context:5.800 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Correlation score weight 0 Gap open cost aa:11,nucl:5 Gap extension cost aa:1,nucl:2 Zdrop 40 Threads 192 Compressed 0 Verbosity 3 Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out Sensitivity 4 k-mer length 0 k-score seq:2147483647,prof:2147483647 Alphabet size aa:21,nucl:5 Max results per query 300 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask residues probability 0.9 Mask lower case residues 0 Minimum diagonal score 15 Selected taxa
Spaced k-mers 1 Spaced k-mer pattern
Local temporary path
Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile E-value threshold 0.001 Global sequence weighting false Allow deletions false Filter MSA 1 Use filter only at N seqs 0 Maximum seq. id. threshold 0.9 Minimum seq. id. 0.0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Pseudo count mode 0 Min codons in orf 15 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Chain overlapping alignments 0 Merge query 1 Search type 0 Start sensitivity 4 Search steps 1 Exhaustive search mode false Filter results during exhaustive search 0 Strand selection 1 LCA search mode false Disk space limit 0 MPI runner
Force restart with latest tmp false Remove temporary files false maximal combined evalue of an optimal set 0.001 minimal length ratio between combined optimal set and target 0.5 Maximal intron length 10000 Minimal intron length 15 Minimal exon length aa 11 Maximal overlap of exons 10 Maximal number of exon sets 1 Gap open penalty -1 Gap extend penalty -1 Reverse AA Fragments 0

extractorfs /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/contigs /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/12404560893930422762/nucl_6f --min-length 15 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 0 --use-all-table-starts 0 --id-offset 0 --create-lookup 0 --threads 192 --compressed 0 -v 3

libgomp: Thread creation failed: Resource temporarily unavailable Error: extractorfs step died $


- With `--threads 1`, it runs. 

$ "$MMSEQS" predictexons --threads 1 contigs targets tmp/MetaEuk_calls tmp/tmp_predict predictexons --threads 1 contigs targets tmp/MetaEuk_calls tmp/tmp_predict

MMseqs Version: e8ef9c146b871a86415c3b74d1db1a4e24026158 Substitution matrix aa:blosum62.out,nucl:nucleotide.out Add backtrace false Alignment mode 2 Alignment mode 0 Allow wrapped scoring false E-value threshold 100 Seq. id. threshold 0 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0 Coverage mode 0 Max sequence length 65535 Compositional bias 1 Compositional bias 1 Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a substitution:1.100,context:1.400 Pseudo count b substitution:4.100,context:5.800 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Correlation score weight 0 Gap open cost aa:11,nucl:5 Gap extension cost aa:1,nucl:2 Zdrop 40 Threads 1 Compressed 0 Verbosity 3 Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out Sensitivity 4 k-mer length 0 k-score seq:2147483647,prof:2147483647 Alphabet size aa:21,nucl:5 Max results per query 300 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask residues probability 0.9 Mask lower case residues 0 Minimum diagonal score 15 Selected taxa
Spaced k-mers 1 Spaced k-mer pattern
Local temporary path
Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile E-value threshold 0.001 Global sequence weighting false Allow deletions false Filter MSA 1 Use filter only at N seqs 0 Maximum seq. id. threshold 0.9 Minimum seq. id. 0.0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Pseudo count mode 0 Min codons in orf 15 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Chain overlapping alignments 0 Merge query 1 Search type 0 Start sensitivity 4 Search steps 1 Exhaustive search mode false Filter results during exhaustive search 0 Strand selection 1 LCA search mode false Disk space limit 0 MPI runner
Force restart with latest tmp false Remove temporary files false maximal combined evalue of an optimal set 0.001 minimal length ratio between combined optimal set and target 0.5 Maximal intron length 10000 Minimal intron length 15 Minimal exon length aa 11 Maximal overlap of exons 10 Maximal number of exon sets 1 Gap open penalty -1 Gap extend penalty -1 Reverse AA Fragments 0

extractorfs /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/contigs /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/nucl_6f --min-length 15 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 0 --use-all-table-starts 0 --id-offset 0 --create-lookup 0 --threads 1 --compressed 0 -v 3

[=================================================================] 100.00% 30.05K 1s 409ms
Time for merging to nucl_6f_h: 0h 0m 0s 207ms Time for merging to nucl_6f: 0h 0m 0s 207ms Time for processing: 0h 0m 2s 807ms translatenucs /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/nucl_6f /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/aa_6f --translation-table 1 --add-orf-stop 0 -v 3 --compressed 0 --threads 1

[=================================================================] 100.00% 908.41K 0s 731ms
Time for merging to aa_6f: 0h 0m 0s 210ms Time for processing: 0h 0m 1s 327ms Create directory /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/tmp_search search /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/aa_6f /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/targets /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/search_res /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/tmp_search --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 100 --min-seq-id 0 --min-aln-len 11 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 1 --compressed 0 -v 3 --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -s 4 -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --spaced-kmer-mode 1 --rescore-mode 0 --filter-hits 0 --sort-results 0 --mask-profile 1 --e-profile 0.001 --wg 0 --allow-deletion 0 --filter-msa 1 --filter-min-enable 0 --max-seq-id 0.9 --qid '0.0' --qsc -20 --cov 0 --diff 1000 --pseudo-cnt-mode 0 --min-length 15 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 0 --use-all-table-starts 0 --id-offset 0 --create-lookup 0 --add-orf-stop 0 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --chain-alignments 0 --merge-query 1 --search-type 0 --num-iterations 1 --start-sens 4 --sens-steps 1 --exhaustive-search 0 --exhaustive-search-filter 0 --strand 1 --lca-search 0 --disk-space-limit 0 --force-reuse 0 --remove-tmp-files 0

prefilter /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/aa_6f /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/targets /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/tmp_search/14055050976729937679/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 1 --compressed 0 -v 3 -s 4.0

Query database size: 908411 type: Aminoacid Estimated memory consumption: 1G Target database size: 302957 type: Aminoacid Index table k-mer threshold: 127 at k-mer size 6 Index table: counting k-mers [=================================================================] 100.00% 302.96K 40s 94ms
Index table: Masked residues: 2249117 Index table: fill [=================================================================] 100.00% 302.96K 52s 604ms
Index statistics Entries: 145543146 DB size: 1321 MB Avg k-mer size: 2.274112 Top 10 k-mers GSDTLW 1615 ASDTLW 1346 SGATSL 1194 FTGTNN 1191 GGQRRS 1182 TSSEYV 1171 CTALSY 1157 EQIRAT 1133 LREGLY 1129 FEDPAM 1111 Time for index table init: 0h 1m 34s 520ms Process prefiltering step 1 of 1

k-mer similarity threshold: 127 Starting prefiltering scores calculation (step 1 of 1) Query db start 1 to 908411 Target db start 1 to 302957 [=================================================================] 100.00% 908.41K 2m 45s 733ms

47.822211 k-mers per position 2686 DB matches per sequence 0 overflows 13 sequences passed prefiltering per query sequence 5 median result list length 162157 sequences with 0 size result lists Time for merging to pref_0: 0h 0m 0s 246ms Time for processing: 0h 4m 23s 584ms align /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/aa_6f /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/targets /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/tmp_search/14055050976729937679/pref_0 /busco/run_eukaryota_odb10/metaeuk_output/tmp/2662957891185453277/tmp/tmp_predict/6867215755761113553/search_res --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 100 --min-seq-id 0 --min-aln-len 11 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 1 --compressed 0 -v 3 ...

milot-mirdita commented 1 year ago

Can you try to set the MMSEQS_NUM_THREADS env variable? That should fix your issue as a workaround (hopefully). We'll have to investigate where we are spawning too many threads.

milot-mirdita commented 1 year ago

Sorr, I missed that you already identified where it goes wrong. We'll take a look. In the meantime the env variable I mentioned should fix the issue.

mickeykawai commented 1 year ago

Thank you very much for the prompt direction! Fixed for both metaeuk easy-predict --threads 1 ... and called from busco --cpu 4 ....

$ export MMSEQS_NUM_THREADS=4
$ export OPENBLAS_NUM_THREADS=4 # for busco
$ busco -m transcriptome -i trinity_out_dir.Trinity.fasta --cpu 4 -o busco_eukaryota_odb10 -l eukaryota_odb10
...
2023-06-20 22:00:57 INFO:   BUSCO analysis done. Total running time: 853 seconds
2023-06-20 22:00:57 INFO:   Results written in <topdir>/busco_eukaryota_odb10
2023-06-20 22:00:57 INFO:   For assistance with interpreting the results, please consult the userguide: https://busco.ezlab.org/busco_userguide.html

2023-06-20 22:00:57 INFO:   Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO
$
mickeykawai commented 1 year ago

c.f. https://gitlab.com/ezlab/busco/-/issues/675#note_1438785434