insitro / redun

Yet another redundant workflow engine
https://insitro.github.io/redun/
Apache License 2.0
510 stars 43 forks source link

Redun caching logic for a master_task #98

Open njbernstein opened 3 months ago

njbernstein commented 3 months ago

Hi there,

I have a master_task which kicks off a bunch of subtasks for a scatter-gather.

                 task_a     task_b
master_task -> task_1 -> task_2
master_task -> task_3 -> task_4
master_task -> task_5 -> task_6

The master_task takes in the global parameters for the master task and all the configurable inputs for task_a and task_b, e.g. master_task(global_input)

global_input is a class made up of inputs to the subtasks, e.g. global_input.task_a_input_1 global_input.task_b_input_1

task_1, task_3, task_5 are task_a with different inputs. task_2, task_4, task_6 are task_b with different inputs.

If we change the inputs for the master_task which are only given to task_b on a rerun we see that the whole task is rerun.

How can we have redun not evaluate caching at the master_task level but only on subtasks?

I think maybe turn off caching on the master task would work?

ctk3b commented 3 months ago

Hmm is it possible to share the task definitions with some concrete/dummy examples?

I'm wondering if the context feature may provide what you need: https://insitro.github.io/redun/config.html#context

mattrasmus commented 3 months ago

Hi @njbernstein thanks for posting this.

I think I understand your question. If master task is a just routing args from its inputs to its child tasks, then rerunning master_task() may not be a significant issue performance-wise. However, if master_task() does some heavy lifting itself or if there many layers of tasks calls until task_b (say in a more realistic pipeline), then you may be interested in a new feature we call Context. It works similar to React Context for routing config to deeply nested tasks without needing the pass the config through all the higher level tasks, which just increases the chance of unncessary reruns.

For more info, see the docs: https://insitro.github.io/redun/config.html#context

You can also check out an example here: https://github.com/insitro/redun/blob/fd9479d13a8d94274fd8e1def14f7d30db1f9572/examples/context/workflow.py