radical-cybertools / radical.entk

The RADICAL Ensemble Toolkit
https://radical-cybertools.github.io/entk/index.html
Other
28 stars 17 forks source link

Is it possible to assign orders for tasks within the same stage in entk? #640

Closed GKNB closed 1 year ago

GKNB commented 1 year ago

The feature I request is to execute tasks within the same stage in some order. There might be some trick that allows us to do that, but currently I have not found it, please let me know if this is currently possible in the framework of rct. Thanks!

One thing I want to mention is that, I tried to follow the example in https://radicalentk.readthedocs.io/en/stable/api/adv_examples/adapt_tc.html, where the example uses stage.post_exec() to create a new stage within the same pipeline when the current stage is finished. I tried to mimic that by using task.post_exec() to create a new task within the same stage when the current task is finished. If this is allowed then we can actually assign order for tasks within stage by dynamically adding new task in runtime. However I got an error, and it seems like this is not allowed.

The reason why I need this feature is that, currently I have the following workflow: |-------------task_1A-------------|-------------task_2A-------------|-------------task_3A-------------|...... |--task_1B---task_1C----task_1D|--task_2B---task_2C----task_2D|--task_3B---task_3C----task_3D|...... |-------------stage_1-------------|-------------stage_2-------------|-------------stage_3-------------|......

where we have 4k tasks (in this plot, k=3), and we want to make sure that 1). task(i+1)A start after task(i)D finish, and 2). task_iB, task_iC and taskiD are executed in sequential order. A more illustrative plot of the workflow can be found in the figure attached. My current idea is to use a single pipeline, with k stages, and put task(i)[A,B,C,D] in the same stage, but that means we need to be able to specify the order of task_(i)[B,C,D] within the same stage_i.

Another project which current is formulated without rct has a similar pattern as follows: pipeline_1: |--task_1A---task_1B--|--task_1E--| |--task_1C---task_1D--| |------stage_1---------|--stage_2--| pipeline_2: |--task_2A---task_2B--|--task_2E--| |--task_2C---task_2D--| |------stage_1---------|--stage_2--| More pipelines....

and we have many pipelines. We want to make sure that within the same pipelinei, task(i)B is executed after task(i)A is finished, and task(i)D is executed after task(i)C is finished, and task(i)E is executed after task_(i)[B,D] are finished. Currently this is done by an ugly bash script using the dependency feature of slurm, but it is not potable. We might not use rct for this project later, but I want to use this example to show that order of tasks within the same stage could be really helpful, especially when we have complicated task dependency.

Thanks! adv_workflow

andre-merzky commented 1 year ago

I am afraid that EnTK does not natively support ordering of tasks within a single stage. We will discuss this ticket on our next devel call and ping back with a proposal on how to handle this use case.

andre-merzky commented 1 year ago

We discussed this again and the answer is no, EnTK will not be able to express or handle task ordering within a single stage. Thanks for providing the use case in detail, we'll take that into account!

shantenujha commented 1 year ago

We might not be able to provide native support in EnTK, but is there a recommended way to do so?

Clearly with two use cases documented here, and others that I can imagine, I wonder if this is something that we can advise our users on how to possibly do so, with the usual caveat & caution YMMV and that this isn't an EnTK supported feature (yet, though I hope it is something that should be taken into account).

mtitov commented 1 year ago

I would have two possible solutions: (1) represent such tasks with dependencies as a single task using a wrapper script (in the 1st example: tasks 1B,1C,1D will be "merged" into 1BCD task) - that would be a straightforward if tasks share the same environment, but more complex if not; also important to note, that traces will not distinguish these [sub]tasks (will be up to user to add required tracing events); (2) have a synchronization flag between stages from different pipelines (in the 1st example: have 2 pipelines, the 2nd pipeline will have stages with a single task, and stages from the 1st pipeline will always wait for a flag from stages (i)D from the 2nd pipeline) - not available right now, require development.

SrinivasMushnoori commented 1 year ago

Just chiming in because my phone lit up: RepEx required EnTK to develop a feature called pipeline.wait(). Essentially a pipeline stalls until a resume condition is met and a pipeline.resume() signal is received from a different pipeline. This feature was required to ensure that a replica waited to perform an exchange until a sufficient number of other replicas were also ready to do so, so we could maintain a constant exchange attempt rate.

That might be useful here? All the red tasks in the provided diagram could be placed in a second pipeline that progresses stages only when the appropriate signal is received, and all the tasks depicted as blue/green circles and green squares could be expressed as one pipeline composed of only those tasks in sequence?

Not sure if this will help, but I figured I might reply because I've seen an issue that vaguely resembles this.

On Mon, Mar 20, 2023, 6:09 AM Shantenu @.***> wrote:

We might not be able to provide native support in EnTK, but is there a recommended way to do so?

Clearly with two use cases documented here, and others that I can imagine, I wonder if this is something that we can advise our users on how to possibly do so, with the usual caveat & caution YMMV and that this isn't an EnTK supported feature (yet, though I hope it is something that should be taken into account).

— Reply to this email directly, view it on GitHub https://github.com/radical-cybertools/radical.entk/issues/640#issuecomment-1476206765, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEF6AIIZW7DZ3QAW33YJJVLW5BJJLANCNFSM6AAAAAAV6GM6AI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

andre-merzky commented 1 year ago

@GKNB : the RP branch feature/bragg.py now contains an example RP script which renders the workflow shown above. The example runs two of those workflows with randomized task runtimes.

Please do have a look if an approach like that would be suitable for your needs. This is just a crude example really - if you are open to that approach we would recast that into some small framework which would make it easier to specify task types, data dependencies etc.

andre-merzky commented 1 year ago

@GKNB : ping :-)

andre-merzky commented 1 year ago

@GKNB : ping :-)

shantenujha commented 1 year ago

GKNB is on (well deserved) leave till tomorrow. Stop bugging him :)

andre-merzky commented 1 year ago

@GKNB : ping? :-)

GKNB commented 1 year ago

I am really sorry for the late reply. The script works perfectly and did exactly what I want on the local machine. I will test it on the cluster (Polaris) to see how it works there. I will close it when the Polaris tests is successful. Just one tiny comment: In the structure diagram, those policies and preliminary_train should be switched, but that does not affect the result. Thanks!

andre-merzky commented 1 year ago

Great, thanks for the feedback! I am going to close this issue then - please don't hesitate to open a new issue if anything comes up!