pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Make lazy task output tables which do not provide a query string cache invalid for subsequent reading tasks #140

Closed nicolasmueller closed 6 months ago

nicolasmueller commented 6 months ago

The cache validity of a lazy task cannot be judged when the tables output by the task do not provide query strings. This should result in the task being cache invalid which it currently does not. This task fixes this.

Checklist

NicolasMuellerQC commented 6 months ago

@windiana42 This should be ready now :)

nicolasmueller commented 6 months ago

I suggest adding an issue for hashing eager tables as their hashed content (should only be executed for lazy=True). This PR looks good.

Tracked in #141