Closed mdonadoni closed 1 year ago
This happened when running https://github.com/reanahub/reana-workflow-engine-snakemake/pull/42#discussion_r859837148
Only run-batch-... is running, all the run-job-... pods have finished:
run-batch-...
run-job-...
$ kubectl get pods | grep run- reana-run-batch-7c0ebe80-6cf3-44df-9899-105ffa5ab062-f2j8k 3/3 Running 0 57m
job-controller has cleaned up all the jobs (175):
$ kubectl logs reana-run-batch-7c0ebe80-6cf3-44df-9899-105ffa5ab062-f2j8k -c job-controller | grep 'Cleaning Kubernetes job' | wc -l 175
According to job-controller, all the job have finished:
$ kubectl exec reana-run-batch-7c0ebe80-6cf3-44df-9899-105ffa5ab062-f2j8k -c job-controller -- curl localhost:5000/jobs > jobs.json $ cat jobs.json | jq '.jobs[] | values[].status' | wc -l 175 $ cat jobs.json | jq '.jobs[] | values[].status' | grep finished | wc -l 175
r-w-e-snakemake confirms that 175 jobs were submitted, however only 171 have finished:
$ kubectl logs reana-run-batch-7c0ebe80-6cf3-44df-9899-105ffa5ab062-f2j8k -c workflow-engine | grep 'submitted job:' | wc -l 175 $ kubectl logs reana-run-batch-7c0ebe80-6cf3-44df-9899-105ffa5ab062-f2j8k -c workflow-engine | grep 'job is finished.' | wc -l 171 $ kubectl logs reana-run-batch-7c0ebe80-6cf3-44df-9899-105ffa5ab062-f2j8k -c workflow-engine | tail 2023-07-17 13:29:58,697 | snakemake.logging | Thread-1 | INFO | Finished job 124. 2023-07-17 13:29:58,697 | snakemake.logging | Thread-1 | INFO | 169 of 176 steps (96%) done 2023-07-17 13:29:58,701 | reana-workflow-engine-snakemake | Thread-1 | INFO | make_data job is finished. job_id: 76807136-32ca-4349-ab4d-1b32d7df8bb8 2023-07-17 13:29:58,702 | snakemake.logging | Thread-1 | INFO | [Mon Jul 17 13:29:58 2023] 2023-07-17 13:29:58,702 | snakemake.logging | Thread-1 | INFO | Finished job 139. 2023-07-17 13:29:58,702 | snakemake.logging | Thread-1 | INFO | 170 of 176 steps (97%) done 2023-07-17 13:30:08,720 | reana-workflow-engine-snakemake | Thread-1 | INFO | make_data job is finished. job_id: 9d02883d-c7da-4f67-909a-72fd53db0dbe 2023-07-17 13:30:08,721 | snakemake.logging | Thread-1 | INFO | [Mon Jul 17 13:30:08 2023] 2023-07-17 13:30:08,721 | snakemake.logging | Thread-1 | INFO | Finished job 154. 2023-07-17 13:30:08,721 | snakemake.logging | Thread-1 | INFO | 171 of 176 steps (97%) done
In the database four jobs are still reported as running:
reana=# select id_, backend_job_id, status from __reana.job where status != 'finished'; id_ | backend_job_id | status --------------------------------------+----------------------------------------------------+--------- 607da740-1199-4cba-9ab7-2cb9ac5772a8 | reana-run-job-160bec2f-86bf-47bc-a208-f0635c3632e4 | running b5ffb667-6910-4e3f-be6e-21c13ca49161 | reana-run-job-f410a704-794b-4cac-8ad3-687644107acb | running 985e6a86-3bc2-4fee-86b0-9e513e80b3f6 | reana-run-job-16380469-6e75-4f9f-98b0-cfd4f4f03d5e | running 49f6e328-00ca-4ca9-a6ed-dc7602e6b9fd | reana-run-job-291c18c3-ccff-4b16-a300-0cdb37671f1c | running (4 rows)
Additional note: reana.yaml
inputs: files: - Snakefile workflow: type: snakemake file: Snakefile resources: kerberos: true outputs: files: - myoutput.png
This happened when running https://github.com/reanahub/reana-workflow-engine-snakemake/pull/42#discussion_r859837148
Only
run-batch-...
is running, all therun-job-...
pods have finished:job-controller has cleaned up all the jobs (175):
According to job-controller, all the job have finished:
r-w-e-snakemake confirms that 175 jobs were submitted, however only 171 have finished:
In the database four jobs are still reported as running: