ssec-jhu / dplutils

Distributed(Data) Pipeline Uitilities
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Disambiguate multiple outputs by task #61

Closed amitschang closed 4 months ago

amitschang commented 5 months ago

The original implementation only supported a simple graph with a single output, but now that we support multiple outputs it is probably a good idea to have a way to identify the task from which an output batch was created. Upon writeout these may need to go to distinct tables or partitions.

Run could have this signature:

def run(self) -> Iterable[Tuple[str, pd.DataFrame]]:

where the tuple is (task_name, dataframe)

added options to writeto include partitioning on task or having outdir map from task to outdir