pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
19 stars 3 forks source link

Support calling subgraph even if position hashes of inputs changed #194

Closed windiana42 closed 2 months ago

windiana42 commented 4 months ago

A work practice might be to develop on a small pipeline instance and only merge changes to main branch if a single task on the full pipeline is validated based on certain expectations. In this scenario, cache invalidation is ignored and the desire is to run single tasks none-the-less. The main obstacle here is the position hash check when pulling inputs to the subgraph that should be run (see #193).

Thus it would be nice to have an option that disables position hash checks when looking for inputs of the subgraph that is run. A warning or error may be raised in case either the cache lookup table or the current graph indicate that multiple instances of the task exist(ed) in the graph.