pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Support <n> cache slots #75

Open windiana42 opened 1 year ago

windiana42 commented 1 year ago

Support for cache slots makes it possible that two branches use the same pipedag instance_id (same materialization target / database schema / file directory), and both can cache their results independently.

This change requires significant refactoring since TableHookResolve interface currently cannot dematerialize a specific cached instance of a pipedag Table. It also requires refactoring of the cache invalidation logic.

windiana42 commented 4 months ago

There is the general demand that people want to keep a certain run outcome for a long time. This feature could be implemented by offering named&pinned cache slots that similarly to the "active" cache slot have a schema name known to users and where they are invited to use them directly.