pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
12 stars 2 forks source link

Improve error message for input not found when running subgraph #193

Open windiana42 opened 2 months ago

windiana42 commented 2 months ago

In case of running a subgraph of the DAG (i.e. just a task, or a stage), it is necessary that all inputs can be fetched from cache. If this fails, it fails with an exception of the kind:

pydiverse.pipedag.errors.CacheError: Couldn't find cached output for task '<Task 'task_name' 0x1faff5c2a091 (id: 11)>' with matching position hash.

This by itself should be replaced by a nice message and which should include the failing task name (the task mentioned here is the one who produces the inputs which were not found).

Furthermore, information about "with matching position hash" would be nice. It would be good to know if a match would be found without looking at the position hash.