A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Refactored compaction_session.py with more modular functions that are called within _execute_compaction. These functions are:
_process_merge_results: processes the results of merge and returns merged delta
_merge: produce merge results
_run_local_merge: gets called if hash_bucket_count == 1
_discover_deltas: returns uniform deltas to compact
_hash_bucket: hashes passed in uniform deltas
These functions will allow for easier support for multiple rounds for large tables, while previously compactable tables are compacted the same (all deltacat pytest tests pass).
Refactored compaction_session.py with more modular functions that are called within _execute_compaction. These functions are:
_process_merge_results: processes the results of merge and returns merged delta _merge: produce merge results _run_local_merge: gets called if hash_bucket_count == 1 _discover_deltas: returns uniform deltas to compact _hash_bucket: hashes passed in uniform deltas These functions will allow for easier support for multiple rounds for large tables, while previously compactable tables are compacted the same (all deltacat pytest tests pass).