Open ion-elgreco opened 4 months ago
Is this a feature that is still on the roadmap? The latest databricks runtime have deletion vectors enabled by default and our admin won't turn it off. Reading these tables via polars is currently not possible.
A temporary workaround that i'm currently implementing is reading delta tables with deletion vectors using the duckdb delta extension based on delta kernel not delta_rs.
It would be great to get this natively in polars.
@dylan-lee94 I started with this in here: https://github.com/ion-elgreco/polars-deltalake/tree/feat/delta_io_plugin
But I won't be able to work on this anymore
Description
With the release of
delta-kernel-rs
it has become easier to built a native reader/writer fordelta
tables. They also target Polars as a user, so that could be beneficial if changes are required: https://github.com/delta-incubator/delta-kernel-rs/issues/48Kernel is currently limited to reads, but this would already be beneficial so we can drop the dependency on python
deltalake
and the pyarrow datasets way of reading these tables. For writing it would enable Polars to add streaming sink support for delta tables, since sink_parquet already exists. Native support makes it also a good replacement of Spark + delta setups.With kernel it's also easier to keep up to date with newer protocol versions and support things such as
column mapping
which is essentially when columns got functionally renamed and support fordeletion vectors
.DuckDB has already built an extension using kernel.