Closed prakash-42 closed 2 weeks ago
@prakash-42 if you use org.apache.iceberg.aws.s3.S3FileIO
you don't need the aws bundle. thats the recommended fileIo to use for aws/s3
example setup below: https://github.com/memiiso/debezium-server-iceberg/blob/f417423d1c338322fc599986b57bc999b81e6083/examples/conf/application.properties#L18-L22
further details in iceberg documentation
Thanks for your response @ismailsimsek . The error went away after I switched to using the S3FileIO instead of org.apache.hadoop.fs.s3a.S3AFileSystem
. I have however run into a different problem after this.
I am trying to setup this project with the catalog-impl as org.apache.iceberg.aws.glue.GlueCatalog
. Here's my configuration properties for the same:
# Iceberg sink config
debezium.sink.iceberg.table-prefix=debeziumcdc_
debezium.sink.iceberg.upsert=true
debezium.sink.iceberg.upsert-keep-deletes=true
debezium.sink.iceberg.write.format.default=parquet
debezium.sink.iceberg.catalog-name=mycatalog
# S3 config using Glue catalog And S3FileIO
debezium.sink.iceberg.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
debezium.sink.iceberg.io-impl=org.apache.iceberg.aws.s3.S3FileIO
debezium.sink.iceberg.warehouse=s3://poc_bucket/icebergcatalog
# debezium.sink.iceberg.type=iceberg # Gives error
debezium.sink.iceberg.catalog-type=hadoop
debezium.sink.iceberg.format-version=2
When I try to run the application, it fails on startup with the following error:
Caused by: org.apache.iceberg.exceptions.ValidationException: Invalid S3 URI, cannot determine scheme: file:/home/glue_use
r/workspace/spark-warehouse/debezium_offset_storage_custom_table/metadata/00000-2a2503fc-a6db-47f2-9ac9-ce21a29322cb.metadata.json
at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
at org.apache.iceberg.aws.s3.S3URI.<init>(S3URI.java:72)
I'm not sure what property I should set so that it creates paths like s3://
instead of file:/
. (I thought that the debezium.sink.iceberg.warehouse
should control this part, but now I'm not sure). Can you suggest me any tips for debugging this? Sorry for pestering you, I think this tool can greatly simplify our data lake's CDC process and hence wanted to set it up.
@prakash-42 you dont need second line below, this two are same and setting the catalog
debezium.sink.iceberg.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
debezium.sink.iceberg.catalog-type=hadoop
outside of that config looks correct to me.
leaving here documentation for aws iceberg integration https://iceberg.apache.org/docs/1.5.0/aws/#glue-catalog
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
Hi! I wasn't sure about the correct forum for asking my question, hope this is the right place.
When I tried to package and run the application (following the steps in the README), I got the following error:
I think AWS SDK isn't bundled by default with the application. Do I need to add this dependency myself (by modifying project's
pom.xml
), or is there a different recommended way for getting the AWS SDK libraries at the runtime?I did notice that PR for issue #201 explicitly removes the AWS SDK, but I couldn't understand the motivation behind that. Please guide me on this, thank you!