ssec-jhu / dplutils

Distributed(Data) Pipeline Uitilities
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Support multiple edges (output streams) in graph #95

Open amitschang opened 1 month ago

amitschang commented 1 month ago

In some cases it may be useful to send different data to different downstream tasks, for example to implement a filter and send filtered data one way and remaining data another (or to output). Right now this has to be implemented by forking a task and doing complementary filtering in two separate tasks, which has some overhead and is repetitive.

Could be implemented by having a task return like:

def task(df: DataFrame) -> DataFrame | list[DataFrame] | dict[DataFrame]

and the edge annotated something like:


(task1, task2, {"streams": [0]})
(task1, task3, {"streams": [1]})