Closed tiborsimko closed 2 years ago
Two things while working on this issue:
Please check all the other workflow enignes and how they behave WRT wrong parameters and please report back here. Just to make sure that other workflow engines behave well.
Please check whether we could write a regression test case for this case. If not easy, we can simply remember to create "wrong" reana.yaml files to be able to cover this situation during integration tests.
CWL and Snakemake engines report running status pretty much early, before validating or big operations, so the issue should not affect them. Serial and Yadage are affected in cases when workflow parameters are not correct (maybe, other scenarios too).
Another approach to deal with this issue across all workflow engines is instead of dealing with workflow engines case by case we can publish running status in reana-commons/workflow_engine.py
(run_workflow_engine_run_command
function). I believe reana-commons
is used in all engines. We already have similar logic for failed
workflows in reana-commons/workflow_engine.py
.
Possible consequences:
we will have one duplicated MQ message per workflow unless we remove all initial "running" status messages from all workflow engines;
in case, we remove all initial status messages from workflow engines and let reana-commons
take care, it may not be obvious in the code for someone in the future "where does the engine report running status initially"?
Possible consequences are not that bad. WDYT? Dealing with running status case by case (a) or in reana-commons
(b)
cc @mvidalgarcia @audrium
As decided, we will modify case-by-case because it will be easier for now instead of releasing a new reana-commons
version.
Current behaviour
When workflow contains an error, such as the following in the roofit example:
The workflow fails:
but it is not reported as failed in the client:
This is because:
Expected behaviour
The users should see this workflow as "failed".
Notes
It would be good for the workflow to report that it is running as soon as possible. IOW, the above workflow should not be in "pending" state when it fails, but should be already in "running" state. In this case we can keep the status trtansition rules unchanged, covering cases like "pending -> failed" being invalid, whilst "pending -> running -> failed" being valid.