trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.54k stars 3.03k forks source link

Files get created on S3 while creating a iceberg table even if schema does not exist #15779

Closed krvikash closed 1 year ago

krvikash commented 1 year ago

I saw a strange behavior while creating a table in the iceberg glue catalog when schema does not exist.

Data and Metadata files get created on S3 while creating an iceberg table even if the schema does not exist. However CREATE TABLE eventually fails with

Query 20230119_105317_00254_sdqd6 failed: Database non_existing_schema not found. (Service: AWSGlue; Status Code: 400; Error Code: EntityNotFoundException; Request ID: b338614c-0cd2-45ae-9ec1-0e5bff3f9acd; Proxy: null)

Stack trace:

com.amazonaws.services.glue.model.EntityNotFoundException: Database non_existing_schema not found. (Service: AWSGlue; Status Code: 400; Error Code: EntityNotFoundException; Request ID: b338614c-0cd2-45ae-9ec1-0e5bff3f9acd; Proxy: null)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
    at com.amazonaws.services.glue.AWSGlueClient.doInvoke(AWSGlueClient.java:12473)
    at com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:12440)
    at com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:12429)
    at com.amazonaws.services.glue.AWSGlueClient.executeCreateTable(AWSGlueClient.java:2556)
    at com.amazonaws.services.glue.AWSGlueClient.createTable(AWSGlueClient.java:2525)
    at io.trino.plugin.iceberg.catalog.glue.GlueIcebergTableOperations.lambda$commitNewTable$0(GlueIcebergTableOperations.java:111)
    at io.trino.plugin.hive.aws.AwsApiCallStats.call(AwsApiCallStats.java:37)
    at io.trino.plugin.iceberg.catalog.glue.GlueIcebergTableOperations.commitNewTable(GlueIcebergTableOperations.java:111)
    at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.commit(AbstractIcebergTableOperations.java:146)
    at org.apache.iceberg.BaseTransaction.commitCreateTransaction(BaseTransaction.java:286)
    at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:265)
    at io.trino.plugin.iceberg.IcebergMetadata.finishInsert(IcebergMetadata.java:843)
    at io.trino.plugin.iceberg.IcebergMetadata.finishCreateTable(IcebergMetadata.java:730)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.finishCreateTable(ClassLoaderSafeConnectorMetadata.java:479)
    at io.trino.metadata.MetadataManager.finishCreateTable(MetadataManager.java:880)
    at io.trino.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$4(LocalExecutionPlanner.java:4038)
    at io.trino.operator.TableFinishOperator.getOutput(TableFinishOperator.java:319)
    at io.trino.operator.Driver.processInternal(Driver.java:394)
    at io.trino.operator.Driver.lambda$process$8(Driver.java:297)
    at io.trino.operator.Driver.tryWithLock(Driver.java:689)
    at io.trino.operator.Driver.process(Driver.java:289)
    at io.trino.operator.Driver.processForDuration(Driver.java:260)
    at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:752)
    at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:165)
    at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:519)
    at io.trino.$gen.Trino_testversion____20230119_092503_71.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

Repro SQL: Make sure schema does not exist in catalog.

CREATE TABLE iceberg.non_existing_schema.test WITH (location = 's3://krvikash-test/non_existing_schema/test') AS SELECT 1 AS id, 'trino' AS name;
image
krvikash commented 1 year ago

Not sure if https://github.com/trinodb/trino/pull/14869 will be sufficient to fix this. Since data files are also getting created in this case.