tabular-io / iceberg-kafka-connect

Apache License 2.0
202 stars 46 forks source link

AWS Glue - EntityNotFoundException: Entity Not Found (Service: Glue, Status Code: 400 ...) #184

Open bochenekmartin opened 8 months ago

bochenekmartin commented 8 months ago

Hi! This is our iceberg connector config:

{

    "connector.class": "io.tabular.iceberg.connect.IcebergSinkConnector",
    "errors.deadletterqueue.context.headers.enable": "true",
    "errors.deadletterqueue.topic.name": "the-dlq",
    "errors.deadletterqueue.topic.replication.factor": "3",
    "header.converter": "org.apache.kafka.connect.storage.SimpleHeaderConverter",
    "iceberg.catalog": "raw_catalog",
    "iceberg.catalog.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog",
    "iceberg.catalog.client.assume-role.arn": "arn:aws:iam::xxxxxxxxxx:role/xxxxxxxxxxxxx-kafka-s3",
    "iceberg.catalog.client.assume-role.external-id": "xxxxxxxxxxxxxxxxxxxxxx",
    "iceberg.catalog.client.assume-role.region": "af-south-1",
    "iceberg.catalog.client.assume-role.tags.LakeFormationAuthorizedCaller": "my-iceberg-connector",
    "iceberg.catalog.client.region": "af-south-1",
    "iceberg.catalog.glue.account-id": "............",
    "iceberg.catalog.glue.lakeformation-enabled": "true",
    "iceberg.catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
    "iceberg.catalog.s3.endpoint": "https://s3.af-south-1.amazonaws.com/",
    "iceberg.catalog.s3.path-style-access": "true",
    "iceberg.catalog.s3.region": "af-south-1",
    "iceberg.catalog.s3.sse.key": "arn:aws:kms:af-south-1:xxxxxxxxxxxx:key/...-.........",
    "iceberg.catalog.s3.sse.type": "kms",
    "iceberg.catalog.warehouse": "s3://xxxxxxxxxxxx-af-south-1-data-raw/warehouse/10x",
    "iceberg.control.topic": "icc-10x-client-general-ledger-posting-event-v002",
    "iceberg.tables": "raw_redpanda.10x_client_general_ledger_posting_event_v002",
    "iceberg.tables.auto-create-enabled": "true",
    "iceberg.tables.dynamic-enabled": "false",
    "iceberg.tables.evolve-schema-enabled": "true",
    "iceberg.tables.upsert-mode-enabled": "false",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "key.converter.schemas.enable": "false",
    "name": "iceberg-connector-name",
    "tasks.max": "1",
    "topics": "the-topic-name",
    "value.converter": "io.confluent.connect.avro.AvroConverter",
    "value.converter.schemas.enable": "true"
}

We've got the following exception:

org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
    at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:632)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:350)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:250)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:219)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:204)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:259)
    at org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:236)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: software.amazon.awssdk.services.glue.model.EntityNotFoundException: Entity Not Found (Service: Glue, Status Code: 400, Request ID: 0a8ff12d-d26c-429a-8032-81664a81f9ac)
    at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125)
    at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82)
    at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60)
    at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:52)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:37)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
    at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
    at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:196)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
    at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
    at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
    at software.amazon.awssdk.services.glue.DefaultGlueClient.getTable(DefaultGlueClient.java:7575)
    at org.apache.iceberg.aws.lakeformation.LakeFormationAwsClientFactory.isTableRegisteredWithLakeFormation(LakeFormationAwsClientFactory.java:115)
    at org.apache.iceberg.aws.lakeformation.LakeFormationAwsClientFactory.s3(LakeFormationAwsClientFactory.java:79)
    at org.apache.iceberg.aws.s3.S3FileIO.client(S3FileIO.java:327)
    at org.apache.iceberg.aws.s3.S3FileIO.initialize(S3FileIO.java:375)
    at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:325)
    at org.apache.iceberg.aws.glue.GlueTableOperations.initializeFileIO(GlueTableOperations.java:223)
    at org.apache.iceberg.aws.glue.GlueTableOperations.io(GlueTableOperations.java:115)
    at org.apache.iceberg.aws.glue.GlueCatalog.newTableOps(GlueCatalog.java:246)
    at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:46)
    at io.tabular.iceberg.connect.data.IcebergWriterFactory.createWriter(IcebergWriterFactory.java:54)
    at io.tabular.iceberg.connect.channel.Worker.lambda$writerForTable$8(Worker.java:242)
    at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1220)
    at io.tabular.iceberg.connect.channel.Worker.writerForTable(Worker.java:241)
    at io.tabular.iceberg.connect.channel.Worker.lambda$routeRecordStatically$5(Worker.java:197)
    at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4204)
    at io.tabular.iceberg.connect.channel.Worker.routeRecordStatically(Worker.java:195)
    at io.tabular.iceberg.connect.channel.Worker.save(Worker.java:184)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
    at io.tabular.iceberg.connect.channel.Worker.save(Worker.java:171)
    at io.tabular.iceberg.connect.IcebergSinkTask.put(IcebergSinkTask.java:150)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:601)
    ... 11 more

Can you please suggest what the error means and how to handle the issue?

danielcweeks commented 8 months ago

Based on the stack trace, the issue is that Glue could not find the table: DefaultGlueClient.getTable, which probably means that either the table doesn't exist or glue had an issue loading it.

jason-da-redpanda commented 3 months ago

The connector config had :

"iceberg.tables.auto-create-enabled": "true"

bochenekmartin commented 1 month ago

Hey @danielcweeks, is there any other reason for the exception, given that the connector config has the property "iceberg.tables.auto-create-enabled" set to "true"?