ray-project / deltacat

A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Apache License 2.0
166 stars 23 forks source link

Extend deltacat interface to support Iceberg bucketing #320

Closed raghumdani closed 5 months ago

raghumdani commented 5 months ago

This PR contains interface changes and local deltacat support to read/writer delta partition spec. The motivation for this change is explained in a different document. However, to summarize we have a usecase where Delta must encapsulate Iceberg Manifest but there is no way to represent iceberg partition in DeltaCAT today. This PR adds partition spec model which allows us to specify identity and bucketing partition strategies where latter is what we will also be using with Iceberg. The interface changes are backward compatible.

pdames commented 5 months ago

@raghumdani LMK if you want to merge the changes as-is then incrementally work toward the recommended changes, since that's also a viable option here.

raghumdani commented 5 months ago

Sure, I will incrementally work towards this. Created this task: https://github.com/ray-project/deltacat/issues/322

raghumdani commented 5 months ago

@pdames Can you approve so I can merge this?