ray-project / deltacat

A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Apache License 2.0
166 stars 23 forks source link

Multiple Rounds #330

Closed akindu-amazon closed 4 months ago

akindu-amazon commented 4 months ago

Refactored by adding for loops that support multiple rounds of hash bucketing and merging. Deltacat pytests pass with these changes. Please let me know if this need more modularization (this is more a implementation match to the POC code).