A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Apache License 2.0
162
stars
23
forks
source link
Fix a bug that caused the RCF to be written incorrectly during multiple rounds #369
Summary
This changes fixes a bug that resulted in RCF to be written incorrectly when there were multiple rounds.
Rationale
N/A
Changes
Bug Fix
Impact
The impact is high as this will lead to data loss in subsequent incremental jobs.
Testing
The assertion that ensures all files are covered as part of RCF is added.
Regression Risk
None
Checklist
[x] Unit tests covering the changes have been added
[x] E2E testing has been performed
Additional Notes
Any additional information or context relevant to this PR.