ray-project / deltacat

A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Apache License 2.0
162 stars 23 forks source link

[Iceberg] AWS Glue Job Runner #190

Open pdames opened 1 year ago

pdames commented 1 year ago

Add a one-click AWS Glue Job Runner that exposes a simplified singular CLI command for creating, configuring, and running an AWS Glue for Ray job against an Iceberg catalog with an integrated local or PyPi build of DeltaCAT.

As part of this story, we also want to ensure that we can read Iceberg tables into both Daft Dataframes and Ray Datasets within our Glue Job.

pdames commented 1 year ago

Daft integration depends on first resolving https://github.com/ray-project/deltacat/issues/170 and publication of Daft 0.1.12+.