trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.25k stars 2.95k forks source link

Creating Hive/parquet table with spaces in column names should return better error message #10828

Open wrb2 opened 2 years ago

wrb2 commented 2 years ago

When I try to create a table with Hive connector using Parquet,

create table test as select 'this' as "wont work";

I get the following error message:

SQL Error [65536]: Query failed (#20220127_134040_85077_yasks): field ended by ';': expected ';' but got 'work' at line 1:   optional binary wont work

This is because while Hive does allow column names with space, Parquet format doesn't.

The error message is pretty obscure and doesn't at all explain what is going on. If it would be improved (by validating before writing I guess?) it would save Trino users some inevitable Googling and explaining.

NaveenNitturu commented 9 months ago

@wrb2 I believe the issue is no longer replicating. If there's a PR related to this issue, could you please share it with me?

wrb2 commented 9 months ago

Which version? I can reproduce it in 413.

create table test with (format = 'parquet') as select 'this' as "wont work";

returns

SQL Error [65536]: Query failed (#20231215_115533_74292_g3sv3): field ended by ';': expected ';' but got 'work' at line 1:   optional binary wont work

One part that might be tricky is that it works with ORC, but not with Parquet, so if you have ORC as default format, you might have to add the format section to the create table to get this behavior.

No idea what it does with Iceberg or Delta but it might be worth investigating as well.

NaveenNitturu commented 9 months ago

@wrb2 I'm checking the latest version, 436, and I'm unable to replicate it. I've shared a screenshot below.

Iceberg response:

Screenshot 2023-12-18 at 2 18 55 PM

Hive response:

Screenshot 2023-12-18 at 3 07 18 PM

I'm encountering an error with the Hive catalog. The specific error message is different from the one you mentioned.

Query 20231218_093520_00009_mxmqb failed: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) io.trino.spi.TrinoException: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.getTableColumnStatistics(ThriftHiveMetastore.java:372) at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.getTableStatistics(ThriftHiveMetastore.java:352) at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.updateTableStatistics(ThriftHiveMetastore.java:517) at io.trino.plugin.hive.metastore.thrift.BridgingHiveMetastore.updateTableStatistics(BridgingHiveMetastore.java:137) at io.trino.plugin.hive.metastore.tracing.TracingHiveMetastore.lambda$updateTableStatistics$5(TracingHiveMetastore.java:148) at io.trino.plugin.hive.metastore.tracing.Tracing.lambda$withTracing$0(Tracing.java:30) at io.trino.plugin.hive.metastore.tracing.Tracing.withTracing(Tracing.java:39) at io.trino.plugin.hive.metastore.tracing.Tracing.withTracing(Tracing.java:29) at io.trino.plugin.hive.metastore.tracing.TracingHiveMetastore.updateTableStatistics(TracingHiveMetastore.java:148) at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore.updateTableStatistics(CachingHiveMetastore.java:551) at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore.updateTableStatistics(CachingHiveMetastore.java:551) at io.trino.plugin.hive.HiveMetastoreClosure.updateTableStatistics(HiveMetastoreClosure.java:140) at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore$CreateTableOperation.run(SemiTransactionalHiveMetastore.java:3227) at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore$Committer.executeAddTableOperations(SemiTransactionalHiveMetastore.java:2154) at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.commitShared(SemiTransactionalHiveMetastore.java:1599) at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.commit(SemiTransactionalHiveMetastore.java:1300) at io.trino.plugin.hive.HiveMetadata.commit(HiveMetadata.java:3843) at io.trino.plugin.hive.HiveTransactionManager.lambda$commit$0(HiveTransactionManager.java:58) at java.base/java.util.Optional.ifPresent(Optional.java:178) at io.trino.plugin.hive.HiveTransactionManager.commit(HiveTransactionManager.java:56) at io.trino.plugin.hive.HiveConnector.commit(HiveConnector.java:228) at io.trino.metadata.CatalogTransaction.commit(CatalogTransaction.java:85) at io.trino.metadata.CatalogMetadata.commit(CatalogMetadata.java:154) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:79) at io.trino.$gen.Trino_dev____20231218_092740_2.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:842) Caused by: org.apache.thrift.TApplicationException: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:81) at io.trino.hive.thrift.metastore.ThriftHiveMetastore$Client.recvGetTableStatisticsReq(ThriftHiveMetastore.java:4243) at io.trino.hive.thrift.metastore.ThriftHiveMetastore$Client.getTableStatisticsReq(ThriftHiveMetastore.java:4230) at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastoreClient.getTableColumnStatistics(ThriftHiveMetastoreClient.java:335) at io.trino.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.lambda$getTableColumnStatistics$14(FailureAwareThriftMetastoreClient.java:184) at io.trino.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.runWithHandle(FailureAwareThriftMetastoreClient.java:485) at io.trino.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.getTableColumnStatistics(FailureAwareThriftMetastoreClient.java:184) at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.lambda$getTableColumnStatistics$5(ThriftHiveMetastore.java:364) at io.trino.plugin.hive.metastore.thrift.ThriftMetastoreApiStats.lambda$wrap$0(ThriftMetastoreApiStats.java:41) at io.trino.plugin.hive.util.RetryDriver.run(RetryDriver.java:116) at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.getTableColumnStatistics(ThriftHiveMetastore.java:362) ... 31 more Suppressed: org.apache.thrift.TApplicationException: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) ... 42 more Suppressed: org.apache.thrift.TApplicationException: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) ... 42 more Suppressed: org.apache.thrift.TApplicationException: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) ... 42 more Suppressed: org.apache.thrift.TApplicationException: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) ... 42 more Suppressed: org.apache.thrift.TApplicationException: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) ... 42 more Suppressed: org.apache.thrift.TApplicationException: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) ... 42 more Suppressed: org.apache.thrift.TApplicationException: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) ... 42 more Suppressed: org.apache.thrift.TApplicationException: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) ... 42 more Suppressed: org.apache.thrift.TApplicationException: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:cases, tblName:test, colNames:[wont work], engine:null) ... 42 more

snowsky commented 9 months ago

Got the same error as @naveennitturu45, and this may be the fix in hive-apache jar file, https://github.com/trinodb/trino-hive-apache/pull/43