reanahub / reana-workflow-controller

REANA Workflow Controller
http://reana-workflow-controller.readthedocs.io/
MIT License
2 stars 38 forks source link

configurable output asset expiry #145

Closed dinosk closed 1 year ago

dinosk commented 6 years ago

In case a user soft deletes a workflow and then needs to download a file from it, deletion of workspaces could be gathered and applied every night. This would make the deletion task asynchronous (stemming from comment) and speed up the responses shown to the client. Allows also a small time window on the administration side to respond to any errors that may occur. For the implementation this could be a celery task triggered by celery beat on a time interval on a worker pod.

tiborsimko commented 4 years ago

This can be addressed more generally as follows:

tiborsimko commented 4 years ago
workflow:
  type: serial
  resources:
      cvmfs:
        - fcc.cern.ch
      retention: 7 days
  specification:
    steps:
      - name: gendata
        environment: 'reanahub/reana-env-root6:6.18.04'
        commands:
        - mkdir -p results && root -b -q 'code/gendata.C(${events},"${data}")'
      - name: fitdata
        environment: 'reanahub/reana-env-root6:6.18.04'
        commands:
        - root -b -q 'code/fitdata.C("${data}","${plot} 

where retention is numerical value and indicates the number of days when the workflow could be garbage-collected after its termination (successful or unsuccessful). It could be put under resources clause perhaps.

(We could perhaps call it expires_in: 7 days or expires_in_days: 7 if that sounds more user-friendly than retention.)

(The goal being that after GC runs, people should still be able to rerun the workflow and obtain the same results. Can be tested with reana-client restart -w myanalysis.42. IOW, this is similar to CPU-vs-HDD resource dilemma; for "hot" analysis runs, it is good to have HDD resources occupied with keeping the latest results; for "cold" analysis runs, we reduce HDD resource usage by deleting unnecessary stuff all the while keeping the ability to get the same files by engaging CPU resources to rerun the recipes.)

tiborsimko commented 1 year ago

Implemented in 0.9.0 as part of the Workspace-Retention sprint.