trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.33k stars 2.97k forks source link

Fails to insert into iceberg table after dropping a partition column that was used by older partition specs #15729

Open krvikash opened 1 year ago

krvikash commented 1 year ago

Repro SQL:

CREATE TABLE test_1 (id INTEGER, name VARCHAR) WITH (partitioning = ARRAY['id', 'truncate(name, 5)']);
INSERT INTO test_1 values(1, 'A');
ALTER TABLE test_1 SET PROPERTIES partitioning = ARRAY['id'];
ALTER TABLE test_1 DROP COLUMN name;
INSERT INTO test_1 values(2);
-- Query 20230116_130052_00022_epnen failed: Type cannot be null

Stack trace:

java.lang.NullPointerException: Type cannot be null
    at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:907)
    at org.apache.iceberg.types.Types$NestedField.<init>(Types.java:446)
    at org.apache.iceberg.types.Types$NestedField.optional(Types.java:415)
    at org.apache.iceberg.PartitionSpec.partitionType(PartitionSpec.java:132)
    at org.apache.iceberg.util.PartitionSet.lambda$new$0(PartitionSet.java:45)
    at org.apache.iceberg.relocated.com.google.common.collect.RegularImmutableMap.forEach(RegularImmutableMap.java:292)
    at org.apache.iceberg.util.PartitionSet.<init>(PartitionSet.java:45)
    at org.apache.iceberg.util.PartitionSet.create(PartitionSet.java:37)
    at org.apache.iceberg.ManifestFilterManager.<init>(ManifestFilterManager.java:94)
    at org.apache.iceberg.MergingSnapshotProducer$DataFileFilterManager.<init>(MergingSnapshotProducer.java:971)
    at org.apache.iceberg.MergingSnapshotProducer$DataFileFilterManager.<init>(MergingSnapshotProducer.java:969)
    at org.apache.iceberg.MergingSnapshotProducer.<init>(MergingSnapshotProducer.java:122)
    at org.apache.iceberg.MergeAppend.<init>(MergeAppend.java:32)
    at org.apache.iceberg.BaseTransaction.newAppend(BaseTransaction.java:151)
    at io.trino.plugin.iceberg.IcebergMetadata.finishInsert(IcebergMetadata.java:835)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.finishInsert(ClassLoaderSafeConnectorMetadata.java:519)
    at io.trino.metadata.MetadataManager.finishInsert(MetadataManager.java:911)
    at io.trino.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$4(LocalExecutionPlanner.java:4090)
    at io.trino.operator.TableFinishOperator.getOutput(TableFinishOperator.java:319)
    at io.trino.operator.Driver.processInternal(Driver.java:411)
    at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
    at io.trino.operator.Driver.tryWithLock(Driver.java:706)
    at io.trino.operator.Driver.process(Driver.java:306)
    at io.trino.operator.Driver.processForDuration(Driver.java:277)
    at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:752)
    at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:165)
    at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:519)
    at io.trino.$gen.Trino_403_620_g2d27ac8____20230116_102702_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
krvikash commented 1 year ago

The issue is happening even with SELECT and UPDATE

select * from test_1;
-- Query 20230116_132246_00023_epnen failed: Type cannot be null
update test_1 set id = 2;
-- Query 20230116_132344_00024_epnen failed: Type cannot be null
alexjo2144 commented 1 year ago

Is this the same as: https://github.com/apache/iceberg/issues/4563

krvikash commented 1 year ago

No, This is not the same. In Trino we allow dropping partition column which is part of older partition spec. Once the column is dropped Insert/select/update fails on the table.

whereas https://github.com/apache/iceberg/issues/4563 is to allow dropping the partition column which is part of the older partition spec. https://github.com/apache/iceberg/pull/4602 fixes this issue.

This issue will be the same as commented here https://github.com/apache/iceberg/pull/4602#discussion_r987088357.

alexjo2144 commented 1 year ago

In Trino we allow dropping partition column which is part of older partition spec.

Can we block people from doing this until that PR is merged?

alexjo2144 commented 1 year ago

@krvikash this is done right?

krvikash commented 1 year ago

For now, we have restricted the user to dropping void partition column and the partition column used in older specs. We still have to support drop void partition column and the partition column used in older specs.