merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
415 stars 142 forks source link

[BUG] Anvio should clear the tmp directory when done, and respect $TMPDIR #2180

Closed Ge0rges closed 7 months ago

Ge0rges commented 7 months ago

Short description of the problem

Perhaps an improvement somewhat more than a bug. I'm not sure how ubiquitous this is across the different commands, but anvi-run-hmms do not clear the /tmp/ directory when done. This means that a large sequences of these commands running may (and in my case do) will the /tmp directory, causing errors when this directory has a disk quota.

anvi'o version

Anvi'o .......................................: marie (v8-dev)
Python .......................................: 3.10.12

Profile database .............................: 39
Contigs database .............................: 22
Pan database .................................: 17
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

Installed anvio using recommended dev install on. Rocky Linux.

Detailed description of the issue

As an example for anvi-run-hmms the folder created is done by tempfile.mkdtemp here. The directory should be deleted here however that isn't happening.

It would be also be convenient if Anvio respected the $TMPDIR environment variable. I'm aware that the python mkdtemp command is to blame here somewhat, but anvio could use tempfile.TemporaryDirectory instead, defining the dir input and using cleanup() to delete it when done.

Files / commands to reproduce the issue

Any anvi-run-hmms.

meren commented 7 months ago

I can't reproduce this. This is my output when I run anvi-run-hmms with default parameters:

$ anvi-run-hmms -c P_MARINUS_MIT9301-contigs.db -I Bacteria_71 --just-do-it

WARNING
===============================================
Previous entries for "Bacteria_71" is being removed from "hmm_hits_info,
hmm_hits, hmm_hits_in_splits, genes_in_contigs, gene_functions"

Contigs DB ...................................: P_MARINUS_MIT9301-contigs.db
HMM sources ..................................: Bacteria_71
Alphabet/context target found ................: AA:GENE
Target sequences determined ..................: 1,852 sequences for AA:GENE

HMM Profiling for Bacteria_71
===============================================
Reference ....................................: Lee modified, https://doi.org/10.1093/bioinformatics/btz188
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: bacteria
HMM model path ...............................: /var/folders/gw/5mdblzs94gsb1ss44llgl3_h0000gn/T/tmp7sa65989/Bacteria_71.hmm
Number of genes in HMM model .................: 71
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
HMMer program used for search ................: hmmscan
Temporary work dir ...........................: /var/folders/gw/5mdblzs94gsb1ss44llgl3_h0000gn/T/tmp0d9elwsh
Log file for thread 0 ........................: /var/folders/gw/5mdblzs94gsb1ss44llgl3_h0000gn/T/tmp0d9elwsh/AA_gene_sequences.fa.0_log

Done with Bacteria_71 🎊

Number of raw hits in table file .............: 70
Number of weak hits removed by HMMER parser ..: 0
Number of hits in annotation dict  ...........: 70

✓ anvi-run-hmms took 0:00:03.889805

TMP directory used here is the following according to the logs,

 /var/folders/gw/5mdblzs94gsb1ss44llgl3_h0000gn/T/tmp0d9elwsh

And a check shows it is not there:

ls /var/folders/gw/5mdblzs94gsb1ss44llgl3_h0000gn/T/tmp0d9elwsh

ls: cannot access '/var/folders/gw/5mdblzs94gsb1ss44llgl3_h0000gn/T/tmp0d9elwsh': No such file or directory

Similarly, when I use it with the $TMPDIR directive,

$ TMPDIR=testtmp/ anvi-run-hmms -c P_MARINUS_MIT9301-contigs.db -I Bacteria_71 --just-do-it

WARNING
===============================================
Previous entries for "Bacteria_71" is being removed from "hmm_hits_info,
hmm_hits, hmm_hits_in_splits, genes_in_contigs, gene_functions"

Contigs DB ...................................: P_MARINUS_MIT9301-contigs.db
HMM sources ..................................: Bacteria_71
Alphabet/context target found ................: AA:GENE
Target sequences determined ..................: 1,852 sequences for AA:GENE

HMM Profiling for Bacteria_71
===============================================
Reference ....................................: Lee modified, https://doi.org/10.1093/bioinformatics/btz188
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: bacteria
HMM model path ...............................: /Users/meren/Downloads/testtmp/tmpn3pqifrj/Bacteria_71.hmm
Number of genes in HMM model .................: 71
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
HMMer program used for search ................: hmmscan
Temporary work dir ...........................: /Users/meren/Downloads/testtmp/tmp3rq_7a6z
Log file for thread 0 ........................: /Users/meren/Downloads/testtmp/tmp3rq_7a6z/AA_gene_sequences.fa.0_log

Done with Bacteria_71 🎊

Number of raw hits in table file .............: 70
Number of weak hits removed by HMMER parser ..: 0
Number of hits in annotation dict  ...........: 70

The TMP dir is,

 /Users/meren/Downloads/testtmp/tmpn3pqifrj

And after a successful finish, it is not there:

ls /Users/meren/Downloads/testtmp/tmp3rq_7a6z

ls: cannot access '/Users/meren/Downloads/testtmp/tmp3rq_7a6z': No such file or directory

The only explanation I can imagine is that the shutil.rmtree is not working for Linux systems :p @ahenoch, can you please run the commands above and see if you can reproduce the problem?

Ge0rges commented 7 months ago

Well that's odd! I agree this is probably an OS issue... will investigate further.