michaelkyu / PlasX

PlasX, a machine learning classifier for identifying plasmid sequences based on genetic architecture
GNU General Public License v3.0
29 stars 1 forks source link

Delete MMSeqs2 tmp directory after the search is complete #1

Closed apcamargo closed 2 years ago

apcamargo commented 2 years ago

PlasX doesn't manually delete the MMSeqs2 tmp directories, even after the m8 file is generated. When querying lots of sequences, these tmp directories build up and can exhaust disk space.

If MMSeqs2 is not being called again after the m8 files are generated, it would be useful if PlasX deleted the tmp directories after each database search.

apcamargo commented 2 years ago

To illustrate, here are the sizes of the tmp directories of my execution before MMSeqs2 failed due to lack of disk space:

78G ./clu70.search.tmp/16184249840533220685
78G ./clu70.search.tmp
99G ./clu60.search.tmp/6694311506641037262
99G ./clu60.search.tmp
62G ./clu90.search.tmp/16717382434608035915
62G ./clu90.search.tmp
89G ./clu50.search.tmp/15651117246458405455
89G ./clu50.search.tmp
53G ./clu80.search.tmp/7227444100715852492
53G ./clu80.search.tmp
apcamargo commented 2 years ago

Thanks for the quick fix, @michaelkyu!

michaelkyu commented 2 years ago

Thanks, @apcamargo, for bringing this to my attention!

After you git pull, you can reinstall with pip install /path/to/PlasX_repo. You won't need to rerun any setup commands (e.g. don't need to run plasx setup). You can proceed from where you left off, which I'm assuming was plasx search_de_novo_families