A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
In case of running a subgraph of the DAG (i.e. just a task, or a stage), it is necessary that all inputs can be fetched from cache. If this fails, it fails with an exception of the kind:
pydiverse.pipedag.errors.CacheError: Couldn't find cached output for task '<Task 'task_name' 0x1faff5c2a091 (id: 11)>' with matching position hash.
This by itself should be replaced by a nice message and which should include the failing task name (the task mentioned here is the one who produces the inputs which were not found).
Furthermore, information about "with matching position hash" would be nice. It would be good to know if a match would be found without looking at the position hash.
In case of running a subgraph of the DAG (i.e. just a task, or a stage), it is necessary that all inputs can be fetched from cache. If this fails, it fails with an exception of the kind:
This by itself should be replaced by a nice message and which should include the failing task name (the task mentioned here is the one who produces the inputs which were not found).
Furthermore, information about "with matching position hash" would be nice. It would be good to know if a match would be found without looking at the position hash.