linkedin / openhouse

Open Control Plane for Tables in Data Lakehouse
https://www.openhousedb.org/
BSD 2-Clause "Simplified" License
311 stars 52 forks source link
big-data catalog datalake datalakehouse declarative iceberg management tables
OpenHouse

Control Plane for Tables in Open Data Lakehouses

CI/CD Commit Activity Docs
GitHub Slack

OpenHouse is an open source control plane designed for efficient management of tables within open data lakehouse deployments. The control plane comprises a declarative catalog and a suite of data services. Users can seamlessly define Tables, their schemas, and associated metadata declaratively within the catalog. OpenHouse reconciles the observed state of Tables with the desired state by orchestrating various data services.

Getting Started

Prerequisites

For building and running locally in Docker Compose, you would need the following:

For deploying OpenHouse to Kubernetes, you would need the following:

Building OpenHouse

To build OpenHouse, you can use the following command:

./gradlew build

Running OpenHouse with Docker Compose

To run OpenHouse, we recommend the SETUP guide. You would bring up all the OpenHouse services, MySQL, Prometheus, Apache Spark and HDFS.

Deploying OpenHouse to Kubernetes

To deploy OpenHouse to Kubernetes, you can use the DEPLOY guide. You would build the container images for all the OpenHouse services, and deploy them to a Kubernetes cluster using Helm.

Compability Matrix

OpenHouse is built with the following versions of the open-source projects:

Project Version
Apache Iceberg 1.2.0
Apache Spark 3.1.2
Apache Livy 0.7.0-incubating
Apache Hadoop Client 2.10.0
Springboot Framework 2.6.6
OpenAPI 3.0.3

Contributing

We welcome contributions to OpenHouse. To get involved:

Please refer to the CONTRIBUTING guide for more details. To get started on the high-level architecture, please refer to the ARCHITECTURE guide.