A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
A work practice might be to develop on a small pipeline instance and only merge changes to main branch if a single task on the full pipeline is validated based on certain expectations. In this scenario, cache invalidation is ignored and the desire is to run single tasks none-the-less. The main obstacle here is the position hash check when pulling inputs to the subgraph that should be run (see #193).
Thus it would be nice to have an option that disables position hash checks when looking for inputs of the subgraph that is run. A warning or error may be raised in case either the cache lookup table or the current graph indicate that multiple instances of the task exist(ed) in the graph.
A work practice might be to develop on a small pipeline instance and only merge changes to main branch if a single task on the full pipeline is validated based on certain expectations. In this scenario, cache invalidation is ignored and the desire is to run single tasks none-the-less. The main obstacle here is the position hash check when pulling inputs to the subgraph that should be run (see #193).
Thus it would be nice to have an option that disables position hash checks when looking for inputs of the subgraph that is run. A warning or error may be raised in case either the cache lookup table or the current graph indicate that multiple instances of the task exist(ed) in the graph.