trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.4k stars 2.99k forks source link

CASCADE is not respected when droping an existing and non-empty Iceberg schema #23132

Closed szisiu closed 3 weeks ago

szisiu commented 2 months ago

Ran on Trino v454 (server and JDBC Driver) with Hive Metastore Standalone as metastore and MinIO as object storage (all on docker).

To reproduce:

trino-docker-test.zip

-- bucket will be created when executing docker compose
create schema iceberg.testschema 
with (location = 's3a://testbucket/testschema'); 

create table iceberg.testschema.testtable (id)
with (
    location = 's3a://testbucket/testschema/testtable'
)
as values
    (100),
    (200);

select
    *
from
    iceberg.testschema.testtable;

-- below works
-- drop table iceberg.testschema.testtable;
-- drop schema iceberg.testschema cascade;

-- below does not work
drop schema iceberg.testschema cascade;

Error Type: EXTERNAL

Error Code: HIVE_METASTORE_ERROR (16777216)

Stacktrace:

io.trino.spi.TrinoException: Database testschema is not empty. One or more tables exist.
    at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.dropDatabase(ThriftHiveMetastore.java:801)
    at io.trino.plugin.hive.metastore.thrift.BridgingHiveMetastore.dropDatabase(BridgingHiveMetastore.java:181)
    at io.trino.metastore.tracing.TracingHiveMetastore.lambda$dropDatabase$10(TracingHiveMetastore.java:182)
    at io.trino.metastore.tracing.Tracing.lambda$withTracing$0(Tracing.java:31)
    at io.trino.metastore.tracing.Tracing.withTracing(Tracing.java:39)
    at io.trino.metastore.tracing.Tracing.withTracing(Tracing.java:30)
    at io.trino.metastore.tracing.TracingHiveMetastore.dropDatabase(TracingHiveMetastore.java:182)
    at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore.dropDatabase(CachingHiveMetastore.java:583)
    at io.trino.plugin.iceberg.catalog.hms.TrinoHiveCatalog.dropNamespace(TrinoHiveCatalog.java:267)
    at io.trino.plugin.iceberg.IcebergMetadata.dropSchema(IcebergMetadata.java:892)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.dropSchema(ClassLoaderSafeConnectorMetadata.java:418)
    at io.trino.tracing.TracingConnectorMetadata.dropSchema(TracingConnectorMetadata.java:350)
    at io.trino.metadata.MetadataManager.dropSchema(MetadataManager.java:799)
    at io.trino.tracing.TracingMetadata.dropSchema(TracingMetadata.java:383)
    at io.trino.execution.DropSchemaTask.execute(DropSchemaTask.java:79)
    at io.trino.execution.DropSchemaTask.execute(DropSchemaTask.java:37)
    at io.trino.execution.DataDefinitionExecution.start(DataDefinitionExecution.java:146)
    at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:272)
    at io.trino.dispatcher.LocalDispatchQuery.startExecution(LocalDispatchQuery.java:150)
    at io.trino.dispatcher.LocalDispatchQuery.lambda$waitForMinimumWorkers$2(LocalDispatchQuery.java:134)
    at io.airlift.concurrent.MoreFutures.lambda$addSuccessCallback$12(MoreFutures.java:570)
    at io.airlift.concurrent.MoreFutures$3.onSuccess(MoreFutures.java:545)
    at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137)
    at io.trino.$gen.Trino_454____20240826_100404_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: io.trino.hive.thrift.metastore.InvalidOperationException: Database testschema is not empty. One or more tables exist.
    at io.trino.hive.thrift.metastore.ThriftHiveMetastore$drop_database_result$drop_database_resultStandardScheme.read(ThriftHiveMetastore.java:47821)
    at io.trino.hive.thrift.metastore.ThriftHiveMetastore$drop_database_result$drop_database_resultStandardScheme.read(ThriftHiveMetastore.java:47797)
    at io.trino.hive.thrift.metastore.ThriftHiveMetastore$drop_database_result.read(ThriftHiveMetastore.java:47730)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:93)
    at io.trino.hive.thrift.metastore.ThriftHiveMetastore$Client.recvDropDatabase(ThriftHiveMetastore.java:1166)
    at io.trino.hive.thrift.metastore.ThriftHiveMetastore$Client.dropDatabase(ThriftHiveMetastore.java:1151)
    at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastoreClient.dropDatabase(ThriftHiveMetastoreClient.java:226)
    at io.trino.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.lambda$dropDatabase$3(FailureAwareThriftMetastoreClient.java:106)
    at io.trino.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.runWithHandle(FailureAwareThriftMetastoreClient.java:476)
    at io.trino.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.dropDatabase(FailureAwareThriftMetastoreClient.java:106)
    at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.lambda$dropDatabase$29(ThriftHiveMetastore.java:792)
    at io.trino.plugin.hive.metastore.thrift.ThriftMetastoreApiStats.lambda$wrap$0(ThriftMetastoreApiStats.java:41)
    at io.trino.plugin.hive.util.RetryDriver.run(RetryDriver.java:116)
    at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.dropDatabase(ThriftHiveMetastore.java:790)
    ... 26 more

Also in the docs:

image

Refs: https://trino.io/docs/current/sql/drop-schema.html

ebyhr commented 2 months ago

I can't reproduce the issue with IcebergMinioHiveMetastoreQueryRunnerMain. Can you investigate which config causes the failure?

szisiu commented 2 months ago

I'm sorry, I didn't catch what you said. Any clue what config params could be investigated? I have a simple setup (no caching, etc.)

I could not find the IcebergMinioHiveMetastoreQueryRunnerMain file in the trino repo.

Nonetheless I use docker containers on my test and production environments. Is it possible to run a test on the atatched docker compose project (against the latest trino public release)?

ebyhr commented 2 months ago

The class exists at https://github.com/trinodb/trino/blob/aea907955d3e6025f84dd91f7cc8ff544ca46c10/plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/IcebergQueryRunner.java#L253

You can simply start a Trino cluster using the main method on IDE.

ebyhr commented 2 months ago

The following cache in metastore-site.xml is the cause as far as I tested locally:

    <property>
        <name>metastore.rawstore.impl</name>
        <value>org.apache.hadoop.hive.metastore.cache.CachedStore</value>
    </property>
szisiu commented 2 months ago

@ebyhr thanks! I overlooked it!

It looks like if you remove the org.apache.hadoop.hive.metastore.cache.CachedStore from metastore-site.xml HMS config file then the drop schema with cascade works as expected.

On the other side there is a perfomance impact on queries.

AFAIK, org.apache.hadoop.hive.metastore.cache.CachedStore is a component of the Hive Metastore that can significantly improve performance by caching frequently accessed metadata in memory.

If possible maybe some drino docs could be updated. I dont know. Leaving that to the trino team to decide.

Refs: