pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Check cache validity of lazy DataFrame tasks. #141

Open NicolasMuellerQC opened 6 months ago

NicolasMuellerQC commented 6 months ago

After #140 Polars and Pandas DataFrames that come out of lazy tasks are always cache-invalid (since they don't have a query string and no version). We should check their cache validity by hashing the data. This should be implemented in the lazy_query_str function of the respective table hooks.