pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Materialization details #122

Closed nicolasmueller closed 8 months ago

nicolasmueller commented 9 months ago

Checklist

nicolasmueller commented 9 months ago

@windiana42 Should specifying a compression method for a backend that does not support it result in an exception? Currently, we just print a warning.

nicolasmueller commented 9 months ago

@windiana42 Could you have a look at this again? Now, the materialization_details can be configured in pipedag.yaml. For now DB2 compression and table spaces are supported (not yet unlogged Postgres tables).

Some notes:

windiana42 commented 8 months ago

I also had to add the materialization_details to the cached table metadata so that they are applied when a table is copied to a new schema.

We store a version number of our metadata in the metadata itself. Shall we bump this version number? Maybe we say we don't need it if we are backwards compatible with tables without materialization_details specified.

nicolasmueller commented 8 months ago

We store a version number of our metadata in the metadata itself. Shall we bump this version number? Maybe we say we don't need it if we are backwards compatible with tables without materialization_details specified.

It's probably better style to bump it, so I'll bump it. For DuckDB it seems necessary anyway to refresh the cache after this change.