Open stevekm opened 3 months ago
The cleanup only iterates through the task directories, that is why it doesn't delete those stage directories. In fact I don't think the cleanup works at all on S3 (see #3645).
You can use nf-boost which has an experimental cleanup that is more efficient, but I haven't implemented cleanup for the stage directories.
If I recall correctly, each run has it's own stage directory of the pattern work/stage-${sessionId}
, so a simple solution would be to just delete that directory at the end. A more aggressive solution would be to delete individual subdirectories as soon as they aren't needed anymore, but I'm not sure how difficult that would be.
Thanks. I was hoping for some solution that could be bundled inside of the nextflow.config
so that it would get run automatically. I will try out nf-boost
as well though would still want some way to "un-stage" the S3 files at the end of the pipeline
You might be able to do it with a workflow onComplete handler in the config file. Something like this:
// nextflow.config
worflow.onComplete = {
workDir.resolve("stage-${workflow.sessionId}").deleteDir()
}
See also: https://nextflow.io/docs/latest/metadata.html#decoupling-metadata
Bug report
When using Nextflow with the
cleanup = true
option, input files staged from S3 are left in the work dir.Expected behavior and actual behavior
In order to automatically clean up the work directory after a successful pipeline run, I was hoping that the
cleanup
option described here might also remove the S3 input files that were staged during pipeline execution. This does not seem to be the case and the files remain in the work dir under a path such aswork/stage-xyz
You can reproduce this by running a pipeline with input files on S3, and include the option
cleanup = true
in yournextflow.config
file. The contents of the task work dirs are removed but the staged files remain.Environment
Additional context
Not sure if this is intentional?