Closed dinosk closed 1 year ago
This can be addressed more generally as follows:
reana.yaml
to a longer value for their important runs workflow
level in reana.yaml
:workflow:
type: serial
resources:
cvmfs:
- fcc.cern.ch
retention: 7 days
specification:
steps:
- name: gendata
environment: 'reanahub/reana-env-root6:6.18.04'
commands:
- mkdir -p results && root -b -q 'code/gendata.C(${events},"${data}")'
- name: fitdata
environment: 'reanahub/reana-env-root6:6.18.04'
commands:
- root -b -q 'code/fitdata.C("${data}","${plot}
where retention
is numerical value and indicates the number of days when the workflow could be garbage-collected after its termination (successful or unsuccessful). It could be put under resources
clause perhaps.
(We could perhaps call it expires_in: 7 days
or expires_in_days: 7
if that sounds more user-friendly than retention
.)
The reana-client
validation should be amended to check for appropriate expiry values. E.g. we allow only integer days, e.g. we allow up to a hard-coded maximum of 14 days. Each REANA instance could have different maximum, so this may need to be gathered via REST API call and/or validated on the server side.
The reana-client list
output (and ditto for some more commands) should be amended in order to display when the given workflow run expires, so that users are notified. (See also below.)
The REANA UI should be amended in order to display nearing expiry on the workflow run details page (and perhaps also on the workflow list page).
Note for the GC deamon: we should keep all the workspace inputs (because perhaps the users might not have them in Git repo) and we should remove only workflow run assets, i.e. all the files from the workspace that are not specified in inputs.
(The goal being that after GC runs, people should still be able to rerun the workflow and obtain the same results. Can be tested with reana-client restart -w myanalysis.42
. IOW, this is similar to CPU-vs-HDD resource dilemma; for "hot" analysis runs, it is good to have HDD resources occupied with keeping the latest results; for "cold" analysis runs, we reduce HDD resource usage by deleting unnecessary stuff all the while keeping the ability to get the same files by engaging CPU
resources to rerun the recipes.)
Implemented in 0.9.0 as part of the Workspace-Retention sprint.
In case a user soft deletes a workflow and then needs to download a file from it, deletion of workspaces could be gathered and applied every night. This would make the deletion task asynchronous (stemming from comment) and speed up the responses shown to the client. Allows also a small time window on the administration side to respond to any errors that may occur. For the implementation this could be a celery task triggered by celery beat on a time interval on a worker pod.