Closed MrTomRod closed 3 years ago
No, I ran exactly the command above (./pgap.py -r --no-internet -o mg37_results test_genomes/MG37/input.yaml
). Did not edit the pgap.py
script.
Right. I realized that after posting my comment and deleted it before seeing yours :-)
I am afraid this might be one of the peculiarities of cwltool
behavior. If we do not pass storage info for tmp files (as we do with --debug
mode) to cwltool
call, it creates them in /tmp/
inside docker container, which is mapped from either $TMPDIR
or /tmp/
. It is not clear right now why cwltool
does not delete these directories upon completing execution in your case.
I would suggest following banal workarounds:
/tmp
pgap.py --debug
, this will allow you at least keep this directories under your output
directory, so they could be handy for any post-mortems. You can delete these directories selectively based on the anticipation that a particular run will cause a new issue here, at our github. Haha, that's fine.
I think I will edit pgap.py
as follows:
/tmp/strain-randomstring
)/tmp
--name strain-randomstring
That should do the trick, right?
And it makes it easy to connect the input to the docker container name as well as the temp files.
I think that would be a slight improvement over the current script. Would you like to have it?
Thanks, that's a good idea.
That would work. We probably do not want to bother users with additional parameters just for the sake of deleting it. We can generate them internally and then delete them.
Again, running pgap.py --debug
has its benefits. If it crashes you do not have to rerun to generate and send a report here: the files are already there, not deleted.
I am having the exact same problem. My /tmp fills up very quickly. @MrTomRod could I please have your edited pgap.py
? Many thanks.
@mdphan: I ended up with an easier solution that does not require me to change pgap.py
. We can simply set the env var TMPDIR
. So this is how I run pgap:
export TMPDIR=/tmp/strain-123 # Now, all PGAP temporary data will end up there
mkdir $TMPDIR
./pgap.py ... # run pgap as you would normally
sudo rm -rf $TMPDIR # brute-remove the temporary data
Note: if you don't want to run sudo
, you can do this to remove the folder at the end:
docker run -itv /tmp:/faketmp alpine:latest rm -rf /faketmp/strain-123
With docker, permissions mean nothing and everyone is admin. :rofl:
Thomas, although we generally agree that we should not be cleaning after cwltool
your solution seems alright.
Would you like to submit a Pull Request?
Thanks for your contribution, Thomas!
ended up with an easier solution that does not require me to change pgap.py. We can simply set the env var TMPDIR
Yes, we introduced honoring TMPDIR settings earlier, at the request in a different issue.
I ended up not changing pgap.py
, so there is nothing to pull, unfortunately.
It's a simple-enough workaround until the cwltool issue is fixed. :)
Sounds good as well. Thank you Thomas!
We have encountered a problem with NCBI PGAP (
2021-01-11.build5132
).After annotating about 20 bacterial genomes, our temporary directory (
/tmp
) filled up with about 30 GB of folders with names like5kod846l
, i.e. 8 random characters.Until
/tmp
is full, the pipeline works well.Running PGAP on the test genome (
./pgap.py -r --no-internet -o mg37_results test_genomes/MG37/input.yaml
) leads to 327 MB of temporary files.Expected behavior PGAP removes its temporary files after exiting.
Software versions
release 8.2.2004 (Core)
2021-01-11.build5132
.19.03.12, build 48a66213fe
Example /tmp content: