reanahub / reana-workflow-engine-snakemake

REANA Workflow Engine Snakemake
MIT License
0 stars 22 forks source link

job-controller does not terminate after workflow finishes #50

Closed mdonadoni closed 1 year ago

mdonadoni commented 1 year ago

In some cases, the job-controller container inside a run-batch pod keeps running even though the workflow-engine container has already finished. The workflow is correctly reported as failed, but the run-batch pod continues running indefinitely.

How to reproduce:

  1. Modify reana-demo-helloworld

    diff --git a/workflow/snakemake/Snakefile b/workflow/snakemake/Snakefile
    index e7344f2..e6f949b 100644
    --- a/workflow/snakemake/Snakefile
    +++ b/workflow/snakemake/Snakefile
    @@ -28,7 +28,4 @@ rule helloworld:
         container:
             "docker://python:2.7-slim"
         shell:
    -        "python {input.helloworld} "
    -        "--inputfile {input.inputfile} "
    -        "--outputfile {output} "
    -        "--sleeptime {params.sleeptime}"
    +        "echo"
  2. Execute the workflow reana-client run -f reana-snakemake.yaml
audrium commented 1 year ago

It happens because we have REANA_RUNTIME_KUBERNETES_KEEP_ALIVE_JOBS_WITH_STATUSES set to failed in values-dev.yaml. Since Snakemake engine marks this workflow as failed it is kept for inspection here. After removing that envar from the values-dev.yaml workflow run-batch pod is terminated as expected