A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Apache License 2.0
166
stars
23
forks
source link
Intelligent estimation of manifest entry size #355
This PR introduces intelligent file size estimation without reading the entire file to be used in head node.