soedinglab / metaeuk

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics
GNU General Public License v3.0
174 stars 23 forks source link

"tmp" directory does not automatically remove #82

Open Zjianglin opened 1 year ago

Zjianglin commented 1 year ago

Expected Behavior

I suppose that tmp directory should be automatically removed after normally running the metaeuk.

Current Behavior

Now, Some tmp directory still exists and the results (proteins.faa, codon.fna, gff, tsv) are separately created.

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

  1. Create Reference DB and index using following commands with UniRef90 protein database:
    mmseqs createdb uniref90.fasta.gz metaeukUniRef90db
    mmseqs createindex metaeukUniRef90db mmseq2tmp4ur90 --split-memory-limit 300G 
  2. run the predication using metaeuk:
    metaeuk easy-predict /path/to/TMMAG23/TMMAG23_fnas/029290645.fna.gz /UniRef/metaeukUniRef90db /path/to/TMMAG23/ORFs/faas/029290645
    /path/to/TMMAG23/ORFs/faas/tmp_029290645 -s 7.5 --threads 94 --min-length    30 --max-length 45354 --use-all-table-starts 1 --met
    aeuk-eval 0.001

MetaEuk Output (for bugs)

Please make sure to also post the complete output of MetaEuk. You can use gist.github.com for large output.

Here is the running results:


$ ls ORFs/faas/ | egrep "029290615|029290645"
029290615.faa.gz
029290645.faa.gz
tmp_029290615
tmp_029290645
$ ls ORFs/gbks/ | egrep "029290615|029290645"
029290615.gff.gz
029290615.headersMap.tsv
029290645.gff.gz
029290645.headersMap.tsv

$ du -h ORFs/faas/tmp_029290615/
1.1G    ORFs/faas/tmp_029290615/1331144458023969388/tmp_predict/676563652756938904/tmp_search/15250875353742476543
1.1G    ORFs/faas/tmp_029290615/1331144458023969388/tmp_predict/676563652756938904/tmp_search
1.5G    ORFs/faas/tmp_029290615/1331144458023969388/tmp_predict/676563652756938904
1.5G    ORFs/faas/tmp_029290615/1331144458023969388/tmp_predict
1.7G    ORFs/faas/tmp_029290615/1331144458023969388
1.7G    ORFs/faas/tmp_029290615/

$ du -h ORFs/faas/tmp_029290645/
1.6G    ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/tmp_search/7751439889528243602
1.6G    ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/tmp_search
2.2G    ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016
2.2G    ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict
2.4G    ORFs/faas/tmp_029290645/6835986714712079013
2.4G    ORFs/faas/tmp_029290645/

Here is the running log:

Create directory /path/to/TMMAG23/ORFs/faas/tmp_029290645
easy-predict /path/to/TMMAG23/TMMAG23_fnas/029290645.fna.gz /UniRef/metaeukUniRef90db /path/to/TMMAG23/ORFs/faas/029290645 /path/t
o/TMMAG23/ORFs/faas/tmp_029290645 -s 7.5 --threads 94 --min-length 30 --max-length 45354 --use-all-table-starts 1 --metaeuk-eval 0
.001 
createdb /path/to/TMMAG23/TMMAG23_fnas/029290645.fna.gz /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/contigs --dbt
ype 2 --compressed 0 -v 3 
Create directory /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict
predictexons /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/contigs /UniRef/metaeukUniRef90db /path/to/TMMAG23/ORFs/
faas/tmp_029290645/6835986714712079013/MetaEuk_calls /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict --su
b-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 100 --min-seq
-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 
1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 -
-pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-sc
ore-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 94 --compressed 0 -v 3 --seed-sub-mat 'aa:VTML8
0.out,nucl:nucleotide.out' -s 7.5 -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seqs 300 --split 0 
--split-mode 2 --split-memory-limit 0 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-un
gapped-score 15 --spaced-kmer-mode 1 --rescore-mode 0 --filter-hits 0 --sort-results 0 --mask-profile 1 --e-profile 0.001 --wg 0 -
-allow-deletion 0 --filter-msa 1 --filter-min-enable 0 --max-seq-id 0.9 --qid '0.0' --qsc -20 --cov 0 --diff 1000 --pseudo-cnt-mod
e 0 --gap-pc 10 --min-length 30 --max-length 45354 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mod
e 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 0 --use-all-table-starts 1 --id-offset 0 --cre
ate-lookup 0 --add-orf-stop 0 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --chain-alignments 0 --merge-que
ry 1 --search-type 0 --num-iterations 1 --start-sens 4 --sens-steps 1 --exhaustive-search 0 --exhaustive-search-filter 0 --strand 
1 --lca-search 0 --disk-space-limit 0 --force-reuse 0 --remove-tmp-files 0 --metaeuk-eval 0.001 --metaeuk-tcov 0.5 --max-intron 10
000 --min-intron 15 --min-exon-aa 11 --max-overlap 10 --max-exon-sets 1 --set-gap-open -1 --set-gap-extend -1 --reverse-fragments 
0 

extractorfs /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/contigs /path/to/TMMAG23/ORFs/faas/tmp_029290645/68359867
14712079013/tmp_predict/7048834739567472016/nucl_6f --min-length 30 --max-length 45354 --max-gaps 2147483647 --contig-start-mode 2
 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 0 --use-al
l-table-starts 1 --id-offset 0 --create-lookup 0 --threads 94 --compressed 0 -v 3 
translatenucs /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/nucl_6f /path/to/TMMAG2
3/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/aa_6f --translation-table 1 --add-orf-stop 0 -v 3 --
compressed 0 --threads 94 
Create directory /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/tmp_search
search /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/aa_6f /UniRef/metaeukUniRef90db /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/search_res /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/tmp_search --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 100 --min-seq-id 0 --min-aln-len 11 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 94 --compressed 0 -v 3 --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -s 7.5 -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --spaced-kmer-mode 1 --rescore-mode 0 --filter-hits 0 --sort-results 0 --mask-profile 1 --e-profile 0.001 --wg 0 --allow-deletion 0 --filter-msa 1 --filter-min-enable 0 --max-seq-id 0.9 --qid '0.0' --qsc -20 --cov 0 --diff 1000 --pseudo-cnt-mode 0 --gap-pc 10 --min-length 30 --max-length 45354 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 0 --use-all-table-starts 1 --id-offset 0 --create-lookup 0 --add-orf-stop 0 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --chain-alignments 0 --merge-query 1 --search-type 0 --num-iterations 1 --start-sens 4 --sens-steps 1 --exhaustive-search 0 --exhaustive-search-filter 0 --strand 1 --lca-search 0 --disk-space-limit 0 --force-reuse 0 --remove-tmp-files 0 

prefilter /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/aa_6f /UniRef/metaeukUniRef
90db.idx /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/tmp_search/77514398895282436
02/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 94 --compressed 0 -v 3 -s 7.5 
align /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/aa_6f /UniRef/metaeukUniRef90db.idx /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/tmp_search/7751439889528243602/pref_0 /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/search_res --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 100 --min-seq-id 0 --min-aln-len 11 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 94 --compressed 0 -v 3 
resultspercontig /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/contigs /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/nucl_6f /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/search_res /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/search_res_by_contig --threads 94 --compressed 0 -v 3 
collectoptimalset /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/search_res_by_contig /UniRef/metaeukUniRef90db /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/dp_predictions --metaeuk-eval 0.001 --metaeuk-tcov 0.5 --max-intron 10000 --min-intron 15 --min-exon-aa 11 --max-overlap 10 --max-exon-sets 1 --set-gap-open -1 --set-gap-extend -1 --score-bias 0 --threads 94 --compressed 0 -v 3 
mvdb /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/tmp_predict/7048834739567472016/dp_predictions /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/MetaEuk_calls 
reduceredundancy /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/MetaEuk_calls /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/MetaEuk_preds /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/MetaEuk_preds_clust --overlap 0 --threads 94 --compressed 0 -v 3 
unitesetstofasta /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/contigs /UniRef/metaeukUniRef90db /path/to/TMMAG23/ORFs/faas/tmp_029290645/6835986714712079013/MetaEuk_preds /path/to/TMMAG23/ORFs/faas/029290645 --protein 0 --translation-table 1 --target-key 0 --write-frag-coords 0 --max-seq-len 65535 --threads 94 -v 3 
Time for merging to 029290645.fas: 0h 0m 0s 315ms
Time for merging to 029290645.codon.fas: 0h 0m 0s 316ms
Time for merging to 029290645.headersMap.tsv: 0h 0m 0s 286ms
Time for merging to 029290645.gff: 0h 0m 0s 302ms

[Run Finished] node8 MetaEuk Finished predicting 029290645, elapsed 6566 seconds!\n

It seems no error or warning during the running stage, but the tmp directory still exists after running metaeuk. Is it correct? Or did I do something wrong ?

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

milot-mirdita commented 1 year ago

The log indicates that everything is correct. I think we just don't clean up correctly. You can manually delete the TMP directory until we implement something for better cleanup.

Zjianglin commented 1 year ago

Okay, Thanks for you reply.