replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
40 stars 17 forks source link

Clean-up #59

Closed hoelzer closed 3 years ago

hoelzer commented 3 years ago

Is it possible to implement a parameter for a final clean-up after the pipeline finished? For example, it would be great to automatically get rid of the -w work folder once the pipeline finished succesfully.

Might be possible via onComplete?

replikation commented 3 years ago

sounds interesting, like the idea.

replikation commented 3 years ago

cleanup might be difficult due to the "root" files generated via docker. the easier solution would be for people to point via flag or "config file" to a designated trash area?

replikation commented 3 years ago

also nextflow has i believe a build-in "remove workdir" function

hoelzer commented 3 years ago

also nextflow has i believe a build-in "remove workdir" function

oh, this would be best then I think. An yes, of course, users can also point to -w /scratch/ or -w /tmp or some other place where the data is deleted automatically eventually

hoelzer commented 3 years ago

Actually there's a(n undocumented) feature to cleanup automatically the work dir adding cleanup = true in the config.

https://github.com/nextflow-io/nf-hack18/issues/3

replikation commented 3 years ago

oh nice find :)

RaverJay commented 3 years ago

Cleaning up after complete finish might be too late:

Just ran into this problem when running a 'heavy' test case of 54 fastqs. With local executor, every kraken2 process writes an unpacked version of the krakenDB into the work folder. This just filled up the disk until a kraken2 process died due to lack of space.

As the DB is not part of the process output, but only the kraken result ist - it should be fine to delete the unpacked copy after kraken2 is done?

hoelzer commented 3 years ago

@RaverJay true, that unpacked krakenDB can be just directly deleted in the kraken process to free-up storage while execution

RaverJay commented 3 years ago

Just hacked that in on github https://github.com/replikation/poreCov/pull/67

replikation commented 3 years ago
hoelzer commented 3 years ago
* i add `scratch true` to "heavy disk" workdirs

* downside its not working properly on wildcards (eg. path("*.fasta")) but in case of kraken classification it works fine

What does scratch true actually do? Just asking bc/ on HPCs depending on their configuration issues can come up if stuff is written to e.g. /scratch or /tmp per default. Or dies this automatically clean up work dirs where it is set to true (this would be awesome)

replikation commented 3 years ago

it is something internal for the nextflow command.run :

on_exit() {
  exit_status=${ret:=$?}
  printf $exit_status > /home/pditommaso/projects/rnatoy/work/ff/836e306cde436de39a0e2e5e4c8afc/.exitcode
  set +u
  [[ "$COUT" ]] && rm -f "$COUT" || true
  [[ "$CERR" ]] && rm -f "$CERR" || true
  (sudo -n true && sudo rm -rf "$NXF_SCRATCH" || rm -rf "$NXF_SCRATCH")&>/dev/null || true
  exit $exit_status
}
replikation commented 3 years ago

nextflow writes its own "scratch dirs"