Closed VMois closed 2 years ago
I can see there are some callbacks available in Snakemake like handle_job_success()
and handle_job_error()
to report job status. It would be good to have a unified way of reporting jobs, so either (a) report via files or (b) report via callbacks. I would prefer to go with callbacks.
suggestion: We can have a daemon thread in the REANAClusterExecutor
class. This thread will periodically poll job statuses.
@mvidalgarcia you have been working on Snakemake a lot. What do you think? Any suggestions on how to do it?
I think it is polling, pooling is a different thing.
Currently, the Snakemake engine determines workflow status by checking
jobfinished
orjobfailed
files.https://github.com/reanahub/reana-workflow-engine-snakemake/blob/7269f5af04159c137222d2e92938f78be5ee62ac/reana_workflow_engine_snakemake/executor.py#L82
But, in the case, of
kubernetes job timeout
parameter this approach is not working. K8s terminates SnakemakeJob
pod and it never gets reported to the engine leaving workflow inrunning
state.There is already some mentioning of polling in the Snakemake engine code.
https://github.com/reanahub/reana-workflow-engine-snakemake/blob/7269f5af04159c137222d2e92938f78be5ee62ac/reana_workflow_engine_snakemake/executor.py#L202-L204
It would be good to add polling from the job controller so the
job_timeout
feature can work and to align Snakemake with other engines where polling is a way of updating job statuses.