Support for parameterized sequential pipelines

I want to parameterize how many times to repeat a stage which depends on previous stages. For example, consider the list [0, 0.2, 0.5, 0.75] and a held-out dataset. I want to have a pipeline that does the following:

Start: train an initial model_0
re-train@0: process 20% of the held-out dataset using the model model_0 and re-train a new model, model_20, including the newly processed samples.
re-train@1: process the next 30% of the held out data with model_20 and re-train model_50
re-train@2: process the next 25% with model_50 and re-train model_75

Ideally I want to be able to modify the list to a different size, for example [0, 0.2, 0.4, 0.5, 0.75, 0.85, 0.95] where it would define re-train@0 until re-train@5. More than that, it then could re-use the cached model_0 and model_20 (model_50 and model_75 are different now because they depend on model_40).

I tried doing this using a foreach to define my stage. However, since I need to reference the previous stage dependency it is not possible, for example if this was possible:

re-train:
    foreach: [0,0.2,0.5,0.75]
    do:
        cmd: python train.py --reference-model=model_${prev_item} --output-model=model_${item}
        deps: [model_${prev_item}]
        outs: [model_${item}]

Then it would be fairly easy to chain the stages. However, AFAIK this is not possible, so my workaround is using an object defined in var such as:

re-trains:
  - {curr: 0.2, prev: 0}
  - {curr: 0.5, prev: 0.2}
  - {curr: 0.75, prev: 0.5}

And then referencing $item.curr and $item.prev. However this is error prone (setting prev wrongly gives weird results without prior warning) and a bit of a hassle to deal with.

I use DBT very frequently and so I think Jinja2 templating could be a good tool to have to deal with these cases. For example, my situation would be solved by doing something like this:

{% set stages = [0.2, 0.5, 0.75] %}

train:
  cmd: python train.py --output-model=model_0
  outs: [model_0]

{% for stage in stages %}
re-train@{{ loop.index0 }}:
    {% set input_model = 'model_0' if loop.first else 'model_' ~ stages[loop.index0 - 1] | replace(".", "_") %}
    {% set output_model = 'model_' ~ stage | replace(".", "_") %}
    cmd: python train.py --reference-model={{  input_model }} --output-model = {{ output_model }}
    deps:
      - {{ output_model }}
    outs:
      - {{ input_model }}
{% endfor %}

Putting it in a template renderer gives:

Rendered output

``` train: cmd: python train.py --output-model=model_0 outs: [model_0] re-train@0: cmd: python train.py --reference-model=model_0 --output-model = model_0_2 deps: - model_0_2 outs: - model_0 re-train@1: cmd: python train.py --reference-model=model_0_2 --output-model = model_0_5 deps: - model_0_5 outs: - model_0_2 re-train@2: cmd: python train.py --reference-model=model_0_5 --output-model = model_0_75 deps: - model_0_75 outs: - model_0_5 ```

I searched for jinja2 on the repo and it seems that it has been considered previously (and deemed too weird/ugly which, honestly, I agree, specially for beginners). However, drawing inspiration from it, another approach would be to allow arithmetic to be done on dvc string interpolation and also provide more values for loops, for example providing idx, which enables something like

vars:
    - retrains: [0.2,0.5,0.75]

train:
    cmd: python train.py --output-model=model_0
    outs: model_0

re-train:
    foreach: ${retrains}
    do:
        cmd: python train.py --reference-model=model_${idx} --output-model=model_${idx+1}
        deps: [model_${idx}]
        outs: [model_${idx+1}]

Which is much cleaner

iterative / dvc

Support for parameterized sequential pipelines #10627