memiiso / debezium-server-iceberg

Replicates any database (CDC events) to Apache Iceberg (To Cloud Storage)
Apache License 2.0
200 stars 36 forks source link
batch cdc debezium hacktoberfest hacktoberfest2021 iceberg iceberg-table

License contributions welcome Java CI

Debezium Iceberg Consumer

This project adds Iceberg consumer to Debezium Server. It could be used to replicate any database(CDC changes) to could as an Iceberg table in realtime. Without requiring Spark, Kafka or Streaming platform. It's possible to consume data in append or update modes.

This project introduces an Iceberg consumer for Debezium Server, enabling real-time replication of Change Data Capture (CDC) events from any database to an Iceberg table. This eliminates the need for additional tools like Spark, Kafka, or dedicated streaming platforms. The consumer supports data ingestion in both append and upsert modes.

See the Documentation Page for more details For a full understanding of current limitations and recommended solutions, please review the caveats.

Debezium Iceberg

Installation

git clone https://github.com/memiiso/debezium-server-iceberg.git
cd debezium-server-iceberg
mvn -Passembly -Dmaven.test.skip package
unzip debezium-server-iceberg-dist/target/debezium-server-iceberg-dist*.zip -d appdist
cd appdist
nano conf/application.properties
bash run.sh

Python Runner for Debezium Server

It's possible to use python to run,operate debezium server

This project provides Python scripts to automate the startup, shutdown, and configuration of Debezium Server. By leveraging Python, you can manage Debezium Server. example:

pip install git+https://github.com/memiiso/debezium-server-iceberg.git@master#subdirectory=python
debezium
# running with custom arguments
debezium --debezium_dir=/my/debezium_server/dir/ --java_home=/my/java/homedir/
from debezium import Debezium

d = Debezium(debezium_dir="/dbz/server/dir", java_home='/java/home/dir')
java_args = []
java_args.append("-Dquarkus.log.file.enable=true")
java_args.append("-Dquarkus.log.file.path=/logs/dbz_logfile.log")
d.run(*java_args)
from debezium import DebeziumRunAsyn

java_args = []
java_args.append("-Dquarkus.log.file.enable=true")
java_args.append("-Dquarkus.log.file.path=/logs/dbz_logfile.log")
d = DebeziumRunAsyn(debezium_dir="/dbz/server/dir", java_home='/java/home/dir', java_args=java_args)
d.run()
d.join()

Contributing

The Memiiso community welcomes anyone that wants to help out in any way, whether that includes reporting problems, helping with documentation, or contributing code changes to fix bugs, add tests, or implement new features. See contributing document for details.

Contributors