trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.26k stars 2.95k forks source link

Predicates on time travel uses the latest schema on Iceberg #23601

Open ebyhr opened 2 days ago

ebyhr commented 2 days ago

A reproducible code snippet using TestIcebergV2:

    @Test
    void testTimeTravelAfterSchemaEvolution()
    {
        String tableName = "test_time_travel_after_schema_evolution" + randomNameSuffix();

        assertUpdate("CREATE TABLE " + tableName + " AS SELECT 1 a, 2 b", 1);
        Table icebergTable = loadTable(tableName);
        long snapshotId = icebergTable.currentSnapshot().snapshotId();

        assertUpdate("ALTER TABLE " + tableName + " DROP COLUMN b");
        assertQuery("SELECT * FROM " + tableName + " FOR VERSION AS OF " + snapshotId, "VALUES (1, 2)");
        assertQuery("SELECT * FROM " + tableName + " FOR VERSION AS OF " + snapshotId + " WHERE a = 1", "VALUES (1, 2)");
        assertQuery("SELECT * FROM " + tableName + " FOR VERSION AS OF " + snapshotId + " WHERE b = 1", "VALUES (1, 2)");
    }

The bottom SELECT statement using a dropped b column in predicates throws an exception:

java.lang.AssertionError: Execution of 'actual' query 20240930_012955_00008_x9hnt failed: SELECT * FROM test_time_travel_after_schema_evolutionbhf2fcwom0 FOR VERSION AS OF 5192999285497503973 WHERE b = 1

    at io.trino.testing.QueryAssertions.assertDistributedQuery(QueryAssertions.java:299)
    at io.trino.testing.QueryAssertions.assertQuery(QueryAssertions.java:187)
    at io.trino.testing.QueryAssertions.assertQuery(QueryAssertions.java:160)
    at io.trino.testing.AbstractTestQueryFramework.assertQuery(AbstractTestQueryFramework.java:350)
    at io.trino.plugin.iceberg.TestIcebergV2.testTimeTravelAfterSchemaEvolution(TestIcebergV2.java:1228)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:507)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1491)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:2073)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:2035)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:187)
Caused by: io.trino.testing.QueryFailedException: Invalid metadata file for table tpch.test_time_travel_after_schema_evolutionbhf2fcwom0
    at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:134)
    at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:565)
    at io.trino.testing.DistributedQueryRunner.executeWithPlan(DistributedQueryRunner.java:554)
    at io.trino.testing.QueryAssertions.assertDistributedQuery(QueryAssertions.java:290)
    ... 10 more
    Suppressed: java.lang.Exception: SQL: SELECT * FROM test_time_travel_after_schema_evolutionbhf2fcwom0 FOR VERSION AS OF 5192999285497503973 WHERE b = 1
        at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:572)
        ... 12 more
Caused by: io.trino.spi.TrinoException: Invalid metadata file for table tpch.test_time_travel_after_schema_evolutionbhf2fcwom0
    at io.trino.plugin.iceberg.IcebergExceptions.translateMetadataException(IcebergExceptions.java:51)
    at io.trino.plugin.iceberg.IcebergSplitSource.getNextBatch(IcebergSplitSource.java:206)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorSplitSource.getNextBatch(ClassLoaderSafeConnectorSplitSource.java:43)
    at io.trino.split.ConnectorAwareSplitSource.getNextBatch(ConnectorAwareSplitSource.java:73)
    at io.trino.split.TracingSplitSource.getNextBatch(TracingSplitSource.java:64)
    at io.trino.split.BufferingSplitSource$GetNextBatch.fetchSplits(BufferingSplitSource.java:130)
    at io.trino.split.BufferingSplitSource$GetNextBatch.fetchNextBatchAsync(BufferingSplitSource.java:112)
    at io.trino.split.BufferingSplitSource.getNextBatch(BufferingSplitSource.java:61)
    at io.trino.split.TracingSplitSource.getNextBatch(TracingSplitSource.java:64)
    at io.trino.execution.scheduler.SourcePartitionedScheduler.schedule(SourcePartitionedScheduler.java:247)
    at io.trino.execution.scheduler.SourcePartitionedScheduler$1.schedule(SourcePartitionedScheduler.java:172)
    at io.trino.execution.scheduler.PipelinedQueryScheduler$DistributedStagesScheduler.schedule(PipelinedQueryScheduler.java:1285)
    at io.trino.$gen.Trino_testversion____20240930_012950_71.run(Unknown Source)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot find field 'b' in struct: struct<1: a: optional int>
    at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
    at org.apache.iceberg.expressions.NamedReference.bind(NamedReference.java:45)
    at org.apache.iceberg.expressions.NamedReference.bind(NamedReference.java:26)
    at org.apache.iceberg.expressions.UnboundPredicate.bind(UnboundPredicate.java:111)
    at org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.predicate(Projections.java:181)
    at org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.predicate(Projections.java:135)
    at org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:347)
    at org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.project(Projections.java:151)
    at org.apache.iceberg.ManifestGroup.lambda$entries$8(ManifestGroup.java:254)
    at com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$3(LocalLoadingCache.java:183)
    at com.github.benmanes.caffeine.cache.UnboundedLocalCache.lambda$computeIfAbsent$2(UnboundedLocalCache.java:297)
    at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1710)
    at com.github.benmanes.caffeine.cache.UnboundedLocalCache.computeIfAbsent(UnboundedLocalCache.java:293)
    at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:112)
    at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:58)
    at org.apache.iceberg.ManifestGroup.lambda$entries$9(ManifestGroup.java:274)
    at org.apache.iceberg.io.CloseableIterable$5.shouldKeep(CloseableIterable.java:139)
    at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:66)
    at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49)
    at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:64)
    at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49)
    at org.apache.iceberg.io.CloseableIterator$3.hasNext(CloseableIterator.java:91)
    at org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:46)
    at org.apache.iceberg.io.CloseableIterable$ConcatCloseableIterable$ConcatCloseableIterator.hasNext(CloseableIterable.java:247)
    at io.trino.plugin.iceberg.IcebergSplitSource.getNextBatchInternal(IcebergSplitSource.java:268)
    at io.trino.plugin.iceberg.IcebergSplitSource.getNextBatch(IcebergSplitSource.java:203)
    ... 18 more

Similar to #23295, but filed a new issue because SELECT without specific predicates works in this case.

mayankvadariya commented 20 hours ago

Issue is same as the one reported in https://github.com/apache/iceberg/issues/11162