RFC specifications - Githubissues

tiborsimko commented 6 years ago

r-w-e-serial will provide an ultra simple serial/sequential workflow engine useful for people who may need to run a sequence of commands and who might be off-put by the complexity that CWL or Yadage might bring. See more musings about the motivation in https://github.com/reanahub/reana-demo-helloworld/issues/13 and https://github.com/reanahub/reana-client/issues/10#issuecomment-338906229.

The implementation would follow the simplest "Sequence" workflow pattern http://www.workflowpatterns.com/patterns/control/basic/wcp1.php. Basically, the engine would execute a shell command, and if exit status is OK, it would execute the next shell command, etc. If the exit status of one command would not be OK, it would exit with an error.

The serial workflow could be pictured as follows:

 inputs
    |
    V
+-------+
| step1 |   ... running in environment E1 with runtime-mounted code C1 on inputs I1
+-------+
    |
    V
+-------+
| step2 |   ... running in environment E2 with runtime-mounted code C2 on inputs I2
+-------+
    |
    V
   ...
    |
    V
+-------+
| stepN |   ... running in environment EN with runtime-mounted code CN on inputs I(N-1)
+-------+
    |
    V
 outputs

In theory, every step of the workflow could run in a different computing environment (different docker image) with different runtime code and input parameters.

In practice, it would not be practical to go too deep that way. The main goal is to offer something simple for people looking for a Travis CI like definition of commands to run. (Think the use case of manipulating videos by running ffmpeg jobs on K8s cloud.) For people having advanced needs, we would be advising them to use the real feature-full workflow engines, CWL and Yadage.

Hence we don't want to go into specifying full tuples (step_i, environment_i, inputs_i, code_i, commands_to_run_i, outputs_i). It should be sufficient to mount input runtime code once for all the steps, or even to use the same environment for all the steps (step_i, environment_1, inptus_1, code_1, commands_to_run_i) which is sort of what Travis CI does. (Circle CI permits to specify different environments, I think.)

Option 1: use the same environment for each step

environments:
  - type: docker
    image: reanahub/reana-env-root6
workflow:
  type: serial
  commands:
    - root -b -q '/code/gendata.C(20000,"/outputs/data.root")'
    - root -b -q '/code/fitdata.C("/outputs/data.root","/outputs/plot.png")'

Option 2: use different environment in different steps

environments:
  - type: docker
    image: johndoe/filter-big
  - type: docker
    image: johndoe/filter-small
  - type: docker
    image: johndoe/plotter
workflow:
  type: serial
  steps:
    - environment: filter-big
      commands:
      - run some shell command
      - run another shell command
    - environment: filter-small
      commands:
      - run something
      - run something else
      - run even more things
      - finish up
    - environment: plotter
      commands:
      - gnuplot plots.gnuplot myresults.csv
      - gnuplot plots.gnuplot myotherresults.csv

Let's muse IRL during kick-off to come up with a very simple specification for those light users who are not primarily looking for a workflow engine, all the while permittting them to easily enter into the computational workflow domain to start using CWL, Snakemake, Yadage, etc solutions later.

dinosk commented 6 years ago

I believe this can be closed @diegodelemos

diegodelemos commented 6 years ago

Yes, closing as we have the component working already.

reanahub / reana-workflow-engine-serial

RFC specifications #2