ray-project / deltacat

A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Apache License 2.0
162 stars 23 forks source link

Fix a bug that caused the RCF to be written incorrectly during multiple rounds #369

Closed raghumdani closed 1 week ago

raghumdani commented 2 weeks ago

Summary

This changes fixes a bug that resulted in RCF to be written incorrectly when there were multiple rounds.

Rationale

N/A

Changes

Bug Fix

Impact

The impact is high as this will lead to data loss in subsequent incremental jobs.

Testing

The assertion that ensures all files are covered as part of RCF is added.

Regression Risk

None

Checklist

Additional Notes

Any additional information or context relevant to this PR.