trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.49k stars 3.02k forks source link

Cannot create Iceberg table with sort order having nested column #19620

Open yeunghl-shoalter opened 1 year ago

yeunghl-shoalter commented 1 year ago

Is nested column not supported for sort order & partitioning via Trino Iceberg connector?

Successfully created an Iceberg table with sort order having nested column via Spark. After that, using "SHOW CREATE TABLE" to retrieve the statement and apply this query (table name changed). This would raise following exception in Trino:

io.trino.spi.TrinoException: Column not found: primary_key.code
    at io.trino.plugin.iceberg.SortFieldUtils.parseSortFields(SortFieldUtils.java:68)
    at io.trino.plugin.iceberg.IcebergUtil.newCreateTableTransaction(IcebergUtil.java:619)
    at io.trino.plugin.iceberg.IcebergMetadata.beginCreateTable(IcebergMetadata.java:786)
    at io.trino.plugin.iceberg.IcebergMetadata.createTable(IcebergMetadata.java:742)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.createTable(ClassLoaderSafeConnectorMetadata.java:428)
    at io.trino.tracing.TracingConnectorMetadata.createTable(TracingConnectorMetadata.java:381)
    at io.trino.metadata.MetadataManager.createTable(MetadataManager.java:771)
    at io.trino.tracing.TracingMetadata.createTable(TracingMetadata.java:388)
    at io.trino.execution.CreateTableTask.internalExecute(CreateTableTask.java:296)
    at io.trino.execution.CreateTableTask.execute(CreateTableTask.java:126)
    at io.trino.execution.CreateTableTask.execute(CreateTableTask.java:92)
    at io.trino.execution.DataDefinitionExecution.start(DataDefinitionExecution.java:145)
    at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:256)
    at io.trino.dispatcher.LocalDispatchQuery.startExecution(LocalDispatchQuery.java:145)
    at io.trino.dispatcher.LocalDispatchQuery.lambda$waitForMinimumWorkers$2(LocalDispatchQuery.java:129)
    at io.airlift.concurrent.MoreFutures.lambda$addSuccessCallback$12(MoreFutures.java:568)
    at io.airlift.concurrent.MoreFutures$3.onSuccess(MoreFutures.java:543)
    at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1133)
    at io.trino.$gen.Trino_423____20231024_081532_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

Query

CREATE TABLE iceberg.test1.test1_test_1 (
  primary_key ROW(code varchar),
  tid bigint
) WITH (
  format = 'ORC',
  format_version = 2,
  sorted_by = ARRAY ['"primary_key.code" DESC NULLS LAST','tid DESC NULLS LAST']
)

Trino version

trino> select version();
 _col0 
-------
 423   
(1 row)
Heltman commented 1 year ago

Trino currently does not have enough support for nested fields. For example, deleting the primary key of a file does not support nested fields.

yeunghl-shoalter commented 1 year ago

Hopefully there would be more support for nested fields. Otherwise it would be difficulty to manage Iceberg tables via Trino especially when the tables are also managed by other tools such as Spark, Flink, Hive, etc...