Open prakash-42 opened 5 months ago
@prakash-42 if you use org.apache.iceberg.aws.s3.S3FileIO
you don't need the aws bundle. thats the recommended fileIo to use for aws/s3
example setup below: https://github.com/memiiso/debezium-server-iceberg/blob/f417423d1c338322fc599986b57bc999b81e6083/examples/conf/application.properties#L18-L22
further details in iceberg documentation
Thanks for your response @ismailsimsek . The error went away after I switched to using the S3FileIO instead of org.apache.hadoop.fs.s3a.S3AFileSystem
. I have however run into a different problem after this.
I am trying to setup this project with the catalog-impl as org.apache.iceberg.aws.glue.GlueCatalog
. Here's my configuration properties for the same:
# Iceberg sink config
debezium.sink.iceberg.table-prefix=debeziumcdc_
debezium.sink.iceberg.upsert=true
debezium.sink.iceberg.upsert-keep-deletes=true
debezium.sink.iceberg.write.format.default=parquet
debezium.sink.iceberg.catalog-name=mycatalog
# S3 config using Glue catalog And S3FileIO
debezium.sink.iceberg.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
debezium.sink.iceberg.io-impl=org.apache.iceberg.aws.s3.S3FileIO
debezium.sink.iceberg.warehouse=s3://poc_bucket/icebergcatalog
# debezium.sink.iceberg.type=iceberg # Gives error
debezium.sink.iceberg.catalog-type=hadoop
debezium.sink.iceberg.format-version=2
When I try to run the application, it fails on startup with the following error:
Caused by: org.apache.iceberg.exceptions.ValidationException: Invalid S3 URI, cannot determine scheme: file:/home/glue_use
r/workspace/spark-warehouse/debezium_offset_storage_custom_table/metadata/00000-2a2503fc-a6db-47f2-9ac9-ce21a29322cb.metadata.json
at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
at org.apache.iceberg.aws.s3.S3URI.<init>(S3URI.java:72)
I'm not sure what property I should set so that it creates paths like s3://
instead of file:/
. (I thought that the debezium.sink.iceberg.warehouse
should control this part, but now I'm not sure). Can you suggest me any tips for debugging this? Sorry for pestering you, I think this tool can greatly simplify our data lake's CDC process and hence wanted to set it up.
@prakash-42 you dont need second line below, this two are same and setting the catalog
debezium.sink.iceberg.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
debezium.sink.iceberg.catalog-type=hadoop
outside of that config looks correct to me.
leaving here documentation for aws iceberg integration https://iceberg.apache.org/docs/1.5.0/aws/#glue-catalog
Hi! I wasn't sure about the correct forum for asking my question, hope this is the right place.
When I tried to package and run the application (following the steps in the README), I got the following error:
I think AWS SDK isn't bundled by default with the application. Do I need to add this dependency myself (by modifying project's
pom.xml
), or is there a different recommended way for getting the AWS SDK libraries at the runtime?I did notice that PR for issue #201 explicitly removes the AWS SDK, but I couldn't understand the motivation behind that. Please guide me on this, thank you!