[Iceberg] Support `ADD Column` at a particular index

trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

https://trino.io

Apache License 2.0

10.49k stars 3.02k forks source link

[Iceberg] Support `ADD Column` at a particular index #20091

Open osscm opened 11 months ago

osscm commented 11 months ago

ref: https://trinodb.slack.com/archives/CJ6UC075E/p1701887715253719

Iceberg already supports reorder Spark has this feature

cc @electrum @vgankidi

findinpath commented 11 months ago

Are you hinting at the following Spark commands?

ALTER TABLE prod.db.sample
ADD COLUMN new_column bigint AFTER other_column
ALTER TABLE prod.db.sample
ADD COLUMN nested.new_column bigint FIRST

Why do you think this would be needed? What are the scenarios you are trying to solve?

cc @martint we've already discussed (via Zoom) this topic previously.

let’s wait for request from the community to actually start any work on this area.

davigust commented 11 months ago

Would like to see this for hive connector as well, either as an ADD COLUMN or ALTER COLUMN option. Here's an example in hive:

ALTER TABLE test_change CHANGE old_name new_name STRING AFTER other_col;

Regarding my scenario, there are times when I want to add a column but place it near an associated column in a table, not at the end, for improved usability. This works well with parquet files and the hive connector - it's a metadata only change, and the existing files work fine with the new table defintion.