xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

Temporary files filling up the temp directory #38

Closed biflorenzi closed 3 years ago

biflorenzi commented 3 years ago

Hi @xiezhq ! I ran into a problem when running the tool on a large dataset (>5000 assemblies); since I had many CPUs available I decided to run 40 assembly in parallel, not thinking about the temporary files that would be surely created, which resulted in filling completely the temporary directory (and getting thus unreliable output). I could try running less job at the same time; but also, I was wondering if there is a built-in way to redirect temporary files to a folder of my choice, or to automatically delete the temporary files once an assembly analysis is done. I am using the one line command you advice in your readme for running the tool on a dataset. Thanks in advance for any assistance in this matter!

Biancamaria

xiezhq commented 3 years ago

The only temporary files are genome sequences, which are created at /tmp/. I just updated the ISEScan to v1.7.2.2.2 to delete temporary files once blastn search completes in case that large amounts of temporary files consume too much space. You can simply copy all .py files to your ISEScan install directory to overwrite the old .py to upgrade your ISEScan to the latest v1.7.2.2.2. The new version output .csv (comma as the delimiter of columns) and .tsv (tab as the delimiter of columns) as well as the other result files output by v1.7.2.1.

With latest v1.7.2.2.2, you can also specify the output directory by command option --output:

python3 ISEScan/isescan.py --seqfile NC_012624.fna --output results --nthread 2
biflorenzi commented 3 years ago

Thank you for the quick and useful reply, I will use your new version to bypass the problem!

Kind regards, Bianca