ray-project / deltacat

A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Apache License 2.0
166 stars 23 forks source link

Intelligent estimation of manifest entry size #355

Closed raghumdani closed 1 month ago

raghumdani commented 1 month ago

This PR introduces intelligent file size estimation without reading the entire file to be used in head node.