nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.7k stars 621 forks source link

Deleting stage folder, with cleanup=true #2139

Closed ersenkavak closed 2 years ago

ersenkavak commented 3 years ago

New feature

After searching through the issues I found that cleanup=true as a general config directive enables cleanup of workdir after the execution is completed. I am guessing keeping the stage folder has its reasons in the history of nextflow development. However, I think it would be extremely useful at least when using with supercomputers and a minio/S3 as a datasource. Storing an extra of 200 GB of fastq.gz file per WGS analysis is pretty unneccessary to do. Having a config parameter to enable deleting stage directory as well as would be nice. Actually, the original need is to be able to remove the workdir with all of the content, as it is 99% of the time unneccessary to store in a production environment.

Usage scenario

Processing large files requires careful manipulation of storage. A 20GB WES data, will take up 100-200GB processing space. Nextflow clean will delete the processing folders but not the stage folder. This results in unnecessary accumulation of repetitive large data.

Suggest implementation

(Highlight the main building blocks of a possible implementation and/or related components)

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.