memiiso / debezium-server-iceberg

Replicates any database (CDC events) to Apache Iceberg (To Cloud Storage)
Apache License 2.0
199 stars 35 forks source link

Support (or document) Azure Storage as sink #222

Open karlschriek opened 1 year ago

karlschriek commented 1 year ago

I am trying so set up a very simple process to stream CDC records from (Azure) SQL Server as Iceberg Tables to an Azure Storage Account. I've come across various potential solutions to do this, most of which involve chaining various tools together (and using Event Hub at some point).

I find would like to be able to go [SQL Server] ----cdc message----> [Debezium Server] ----iceberg table----> [Azure Blob Storage] instead. Is this possible as of today? If so could we document it somewhere? If not, could we support this?

ismailsimsek commented 1 year ago

Hi @karlschriek yes this should be possible with current release(supported). it should be mater of configuring Hadoop FileIO to write Azure Blob and configuring Iceberg Catalog to use azure hive server.

karlschriek commented 1 year ago

Ok, that sounds promising. Are there any docs anywhere on how to do something like that? Right now this is the only example config I am able to reference, which is very S3-specific:

https://github.com/memiiso/debezium-server-iceberg/blob/3f0649ae880e9bedd2bdff9e43ca5601bda3da0d/debezium-server-iceberg-sink/src/main/resources/conf/application.properties.example

karlschriek commented 1 year ago

Hmmm, as far as I can see there are currently two unmerged PRs open that would add ADLS as FileIO, so doesn't look like it is actually supported right now:

ismailsimsek commented 1 year ago

it is supported with Hadoop file io, i believe this prs are adding more direct Azure Storage integration(Without Hadoop libraries)

Currently, HadoopFileIO is used to talk to azure blob storage.

ghost commented 1 year ago

@karlschriek Have you been able to get this up and running with Azure Blob Storage?

@ismailsimsek can you point me to some documentation to help me to get this working on Azure Blob?

ismailsimsek commented 1 year ago

could you try this options https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage adding debezium.sink.iceberg. as prefix.

it will also require hadoop azure library if its not included currently https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-azure/3.3.6

ismailsimsek commented 1 year ago

related to https://github.com/apache/iceberg/issues/8662

ghost commented 1 year ago

Thanks, will give this a try if I have my setup in docker with sqlserver up and running.

ismailsimsek commented 6 months ago

leaving example here: https://github.com/tabular-io/iceberg-kafka-connect?tab=readme-ov-file#azure-adls-configuration-example

github-actions[bot] commented 15 hours ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.