Closed GOVINDARAMTEKKAR97 closed 5 months ago
@GOVINDARAMTEKKAR97 if you configure azurefileio then you can push data to azure.
previous issue #222
you can find a example here: https://github.com/tabular-io/iceberg-kafka-connect?tab=readme-ov-file#azure-adls-configuration-example
Hi @ismailsimsek , I am facing many error like rest catalog one of them and hivefivemetatstore and other. One of the following I menioned below. at io.debezium.server.Main.main(Main.java:15) Caused by: java.lang.IllegalArgumentException: Cannot initialize Catalog implementation rest: Cannot find constructor for interface org.apache.iceberg.catalog.Catalog Missing rest [java.lang.ClassNotFoundException: rest]
If you update me application.properties parameter here like aws as mentioned below debezium.sink.iceberg.io-impl=org.apache.iceberg.aws.s3.S3FileIO debezium.sink.iceberg.s3.access-key-id=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx debezium.sink.iceberg.s3.secret-access-key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx debezium.sink.iceberg.warehouse=s3://dremio-califonia/iceberg_warehouse debezium.sink.iceberg.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
It will very helpful for me, or you can update application.properties.examples then it will good for other people aslo. adding parameter for Azure blob storage will be help a lot.
@GOVINDARAMTEKKAR97 i recommend trying it with jdbc catalog, see example below
for azure you can see example config here
for rest catalog you can see example config here
thanks @ismailsimsek for azure you can see example config here above this examples I already try still facing issues
one of error of screenshot I have attached.
could you share your config?
debezium.sink.iceberg.catalog.type=jdbc debezium.sink.iceberg.warehouse=abfss://iceberg@icebergasl.dfs.core.windows.net/warehouse debezium.sink.iceberg.catalog.uri=jdbc:postgresql://localhost:5432/dremio debezium.sink.iceberg.io-impl=org.apache.iceberg.azure.adlsv2.ADLSFileIO debezium.sink.iceberg.include-credentials=true
i am doing all things on amazon EC2 instance
hi @ismailsimsek please check above values.
jdbc config looks correct but the error message you shared is about hive catalog! what is the issue you having using jdbc catalog? also do you have the postgresql database running on the EC2 instance (jdbc:postgresql://localhost:5432/dremio)
If I am using AWS configuration parameters that I shared with you already everything is working fine Able to store parquet and json files to s3 . But when I am doing for azure blob storage it's giving me the error My database running on local host .
From: ismail simsek @.> Sent: Wednesday, May 22, 2024 7:59:53 PM To: memiiso/debezium-server-iceberg @.> Cc: Govinda Ramtekkar @.>; Mention @.> Subject: Re: [memiiso/debezium-server-iceberg] How to push Iceberg data to Azure Blob Storage ? (Issue #325)
jdbc config looks correct but the error message you shared is about hive catalog! what is the issue you having using jdbc catalog? also do you have the postgresql database running on the EC2 instance (jdbc:postgresql://localhost:5432/dremio)
— Reply to this email directly, view it on GitHubhttps://github.com/memiiso/debezium-server-iceberg/issues/325#issuecomment-2124948596, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BE7SFW3I2EPVGWEEYFRGJE3ZDSTWDAVCNFSM6AAAAABH7Y5EVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRUHE2DQNJZGY. You are receiving this because you were mentioned.Message ID: @.***>
Hi @ismailsimsek , can you try once for azure blob storage and let me know which parameter to use or , what should I change with values in application.properties and that can I used to test again.
@GOVINDARAMTEKKAR97 please check the example https://github.com/tabular-io/iceberg-kafka-connect?tab=readme-ov-file#azure-adls-configuration-example
warehouse
, io-impl
, include-credentials
in the application.properties
Hi @ismailsimsek , hope you are in good health. I am able to push data and metadata that is parquet and json files to Amazon S3 storage. It is working well and able to push data to s3 buckets. Now I want to push data to Azure Blob storage. Can you please guide me which Catalog and parameters to should use, it will help me a lot if you guide.
I have attached application.properties below for your references.
postgres
debezium.source.connector.class=io.debezium.connector.postgresql.PostgresConnector debezium.source.offset.storage.file.filename=data/offsets.dat debezium.source.offset.flush.interval.ms=0 debezium.source.database.hostname=localhost debezium.source.database.port=5432 debezium.source.database.user=postgres debezium.source.database.password=root@123 debezium.source.database.dbname=dremio debezium.source.topic.prefix=tutorial debezium.source.schema.include.list=public
ENABLE_DEBEZIUM_SCRIPTING=true
debezium.sink.type=iceberg
icebergevents
iceberg
Iceberg sink config
debezium.sink.type=iceberg debezium.sink.iceberg.table-prefix=debeziumcdc_ debezium.sink.iceberg.upsert=true debezium.sink.iceberg.upsert-keep-deletes=false debezium.sink.iceberg.write.format.default=parquet debezium.sink.iceberg.catalog-name=default
enable event schemas - mandatory
debezium.format.value.schemas.enable=true debezium.format.key.schemas.enable=true debezium.format.value=json debezium.format.key=json
SET LOG LEVELS
quarkus.log.level=INFO quarkus.log.console.json=false
hadoop, parquet
quarkus.log.category."org.apache.hadoop".level=WARN quarkus.log.category."org.apache.parquet".level=WARN
Ignore messages below warning level from Jetty, because it's a bit verbose
quarkus.log.category."org.eclipse.jetty".level=WARN
see https://debezium.io/documentation/reference/stable/development/engine.html#advanced-consuming
debezium.source.offset.storage=io.debezium.server.iceberg.offset.IcebergOffsetBackingStore debezium.source.offset.storage.iceberg.table-name=debezium_offset_storage_custom_table
see https://debezium.io/documentation/reference/stable/development/engine.html#database-history-properties
debezium.source.schema.history.internal=io.debezium.server.iceberg.history.IcebergSchemaHistory debezium.source.schema.history.internal.iceberg.table-name=debezium_database_history_storage_test
enable event schemas
debezium.format.value.schemas.enable=true debezium.format.value=json
complex nested data types are not supported, do event flattening. unwrap message!
debezium.transforms=unwrap debezium.transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState debezium.transforms.unwrap.add.fields=op,table,source.ts_ms,db debezium.transforms.unwrap.delete.handling.mode=rewrite debezium.transforms.unwrap.drop.tombstones=true
################## debezium.sink.batch.batch-size-wait=MaxBatchSizeWait debezium.sink.batch.batch-size-wait.max-wait-ms=180000 debezium.sink.batch.batch-size-wait.wait-interval-ms=120000 debezium.sink.batch.metrics.snapshot-mbean=debezium.postgres:type=connector-metrics,context=snapshot,server=testc debezium.sink.batch.metrics.streaming-mbean=debezium.postgres:type=connector-metrics,context=streaming,server=testc
increase max.batch.size to receive large number of events per batch
debezium.source.max.batch.size=15 debezium.source.max.queue.size=45
S3 config without hadoop catalog. Using GlueCatalog catalog And S3FileIO
debezium.sink.iceberg.io-impl=org.apache.iceberg.aws.s3.S3FileIO debezium.sink.iceberg.s3.access-key-id=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx debezium.sink.iceberg.s3.secret-access-key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx debezium.sink.iceberg.warehouse=s3://dremio-califonia/iceberg_warehouse debezium.sink.iceberg.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
Please give us solution or any way so I can accomplished pushing data to Azure Blob Storage.
Regards, Your Faithfully Govinda Ramtekkar