madminer-tool / madminer-workflow

Madminer complete cloud-based analysis
MIT License
4 stars 4 forks source link

Union of sub-workflows #30

Closed Sinclert closed 4 years ago

Sinclert commented 4 years ago

Context

Following the split into different repos of this repository original code (described in this issue, and implemented in this PR), workflow specifications now live on:


Problem

This split is very nice for having both sub-workflows individually runnable by Yadage, but it is bad when it comes to concatenating both of them to create the complete workflow that this repo builds and deploys in REANA.

As those sub-workflows are their own entities now, they both consider init as the initial stage to start its execution. This is an issue when we need to append concatenate:

  1. Last stage of the Physics sub-workflow (combine).
  2. First stage of the ML sub-workflow (sampling).

Because samping must depend on combine when executing the full-workflow, but on init when executing only itself.


Disregarded approaches 🚫

A) Duplicate the ML sub-workflow on this repo.

This approach is a bit dumb, as we would be duplicating the whole ML sub-workflow just to change init by combine in a couple of places. In addition, having the sub-workflow duplicated, means that any future changes need to be applied twice.

B) Duplicate the ML sub-workflow on its own repo.

This approach share the same problems as the previous one. In addition, the duplicated spec. would be referencing a "combine" stage which is not even declare in the same repository.

C) Define ML sub-workflow with combine as default.

This approach solves the problem when trying to launch the complete workflow from this repository, but makes the sub-workflow execution non-viable.


Proposed approach 🚀

My proposed approach is to create some kind of script or Makefile rule that not only copies the sub-workflow specs into the reana folder, but also substitutes some of the ML sub-workflow init occurrences by combine before submitting the complete workflow to Yadage or REANA.

It will involve some custom logic, but I think it is the best approach to avoid duplications.


Opinions? 👍 / 👎

Sinclert commented 4 years ago

The end solution did not involve Yadage workflow templating.

I learned workflows can be concatenated using the {stage: <workflow_name>[*].<stage_name>, ...} syntax, so I defined the second sub-workflow (the ML one) using that syntax.