reanahub / reana-workflow-engine-snakemake

REANA Workflow Engine Snakemake
MIT License
0 stars 22 forks source link

add job status polling #33

Closed VMois closed 2 years ago

VMois commented 2 years ago

Currently, the Snakemake engine determines workflow status by checking jobfinished or jobfailed files.

https://github.com/reanahub/reana-workflow-engine-snakemake/blob/7269f5af04159c137222d2e92938f78be5ee62ac/reana_workflow_engine_snakemake/executor.py#L82

But, in the case, of kubernetes job timeout parameter this approach is not working. K8s terminates Snakemake Job pod and it never gets reported to the engine leaving workflow in running state.

There is already some mentioning of polling in the Snakemake engine code.

https://github.com/reanahub/reana-workflow-engine-snakemake/blob/7269f5af04159c137222d2e92938f78be5ee62ac/reana_workflow_engine_snakemake/executor.py#L202-L204

It would be good to add polling from the job controller so the job_timeout feature can work and to align Snakemake with other engines where polling is a way of updating job statuses.

VMois commented 2 years ago

I can see there are some callbacks available in Snakemake like handle_job_success() and handle_job_error() to report job status. It would be good to have a unified way of reporting jobs, so either (a) report via files or (b) report via callbacks. I would prefer to go with callbacks.

suggestion: We can have a daemon thread in the REANAClusterExecutor class. This thread will periodically poll job statuses.

@mvidalgarcia you have been working on Snakemake a lot. What do you think? Any suggestions on how to do it?

mvidalgarcia commented 2 years ago

I think it is polling, pooling is a different thing.