Closed mdonadoni closed 9 months ago
The pod reana-run-batch-... is terminated as soon as one job of the workflow fails, even if there are more jobs running: https://github.com/reanahub/reana-workflow-controller/blob/3004b14a7d60eb39dfcbdc51e15242b27edd70c3/reana_workflow_controller/consumer.py#L163-L170
reana-run-batch-...
This means that those jobs will outlive reana-run-batch-..., the k8s pod are not cleaned up and the database is not updated.
How to reproduce:
One job will fail, the other one will continue running even after reana-run-batch-... is terminated. The job pod will not be cleaned up either.
The pod
reana-run-batch-...
is terminated as soon as one job of the workflow fails, even if there are more jobs running: https://github.com/reanahub/reana-workflow-controller/blob/3004b14a7d60eb39dfcbdc51e15242b27edd70c3/reana_workflow_controller/consumer.py#L163-L170This means that those jobs will outlive
reana-run-batch-...
, the k8s pod are not cleaned up and the database is not updated.How to reproduce:
One job will fail, the other one will continue running even after
reana-run-batch-...
is terminated. The job pod will not be cleaned up either.values-dev.yaml
```diff REANA_RATELIMIT_SLOW: "5 per second" reana_workflow_controller: image: docker.io/reanahub/reana-workflow-controller - environment: - REANA_RUNTIME_KUBERNETES_KEEP_ALIVE_JOBS_WITH_STATUSES: failed + # environment: + # REANA_RUNTIME_KUBERNETES_KEEP_ALIVE_JOBS_WITH_STATUSES: failed reana_workflow_engine_cwl: image: docker.io/reanahub/reana-workflow-engine-cwl reana_workflow_engine_yadage: ```reana.yaml
```yaml version: 0.9.0 inputs: files: - Snakefile workflow: type: snakemake file: Snakefile ```Snakefile
```snakefile rule all: input: "r1.txt", "r2.txt", rule r1: output: "r1.txt" container: "docker://docker.io/library/python:3.8-slim" shell: "sleep 120; echo done > r1.txt" rule r2: output: "r2.txt" container: "docker://docker.io/library/python:3.8-slim" shell: "exit 1" ```