reanahub / reana-workflow-engine-serial

REANA Workflow Engine Serial
http://reana-workflow-engine-serial.readthedocs.org
MIT License
0 stars 33 forks source link

support for partial workflow execution #54

Open tiborsimko opened 5 years ago

tiborsimko commented 5 years ago

Similarly to CWL and Yadage, the Serial workflow engine should support partial workflow execution.

This will be useful for incremental workflow development and debugging, e.g. run preparatory steps 1-3 until the process succeeds, then run filtering steps 4-10 until success, and then run the plotting steps 11-13 until success.

If we don't have names for steps, the CLI could look like:

$ reana-client start --target step7

If we do introduce names for steps, the CLI could look like:

$ reana-client start --target myfit

See also https://github.com/reanahub/reana-client/issues/202

tiborsimko commented 5 years ago

Note: the new Yadage version should support partial workflow execution as well, so after we upgrade CWL and Yadage, we can take advantage of the new named steps feature in Serial, and we could add a generalised support for the three workflow engines for the partial execution functionality.

lukasheinrich commented 5 years ago

as we add features, maybe the REANA docs should have a feature matrix showing which engines have which features

tiborsimko commented 5 years ago

@lukasheinrich Yes, we currently have "To pick a workflow engine" under https://reana.readthedocs.io/en/latest/userguide.html#capture-your-workflows We can enrich and publicise it better.

dprelipcean commented 5 years ago

For yadage and cwl, the steps are clearly separated, but for serial there is a hybrid in between steps and commands, e.g. for the rootfit demo:

    steps:
      - environment: 'reanahub/reana-env-root6'
        commands:
        - mkdir -p results
        - root -b -q 'code/gendata.C(${events},"${data}")' | tee gendata.log
        - root -b -q 'code/fitdata.C("${data}","${plot}")' | tee fitdata.log

This is only one step, that contains three commands. We should decide what is a step from a user perspective: i) Each command (which creates its pod) to be a step, i.e. step 1 would be root -b -q 'code/gendata.C... ii) Commands to be grouped in steps (the way it is right now), i.e. step1 would be all the commands.

All serial workflows so far are grouped into one step that consists of more commands, so the second approach would not bring much functionality.

tiborsimko commented 5 years ago

One step can consist of running several commands. The above example could be split into two steps, the "gendata" step (containing one command) and the "fitdata" step (containing the other command). But in theory there can be N steps and each step Si can consist of many commands... so running a step means running all the preceding steps and all of step's own commands.

dprelipcean commented 5 years ago

I have made a pr tackling the last comment. If the implementation seems ok, I could go ahead for more functionality, e.g.:

$ reana-client start -o step_start=1 -o step=3
dprelipcean commented 5 years ago

The 'cwl tool' functionality for target is actually the opposite of what we had in mind, i.e. for them target is the starting step, not the last one.

We should think whether the engines should be consistent among each other (preferred). However, if this brings too much overhead, than some changes could be accepted, as the end user will use in the end only one workflow.