trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.88k stars 2.86k forks source link

Can't ANALYZE stats against Hive v3 #8548

Open tooptoop4 opened 2 years ago

tooptoop4 commented 2 years ago

I have a catalog connecting against hive/s3/parquet (kerberos protected HMS). SELECT * from any of the tables works. But ANALYZE statement (or CREATE TABLE AS ie a managed table) fails with error like the below. How to increase the timeout value? I wonder if the kerberos details aren't being supplied properly to the 'write' path of the HMS.

io.prestosql.spi.PrestoException: redactedhost:9083: java.net.SocketTimeoutException: Read timed out
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.setPartitionColumnStatistics(ThriftHiveMetastore.java:648)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.setPartitionColumnStatistics(ThriftHiveMetastore.java:625)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.updatePartitionStatistics(ThriftHiveMetastore.java:605)
    at io.prestosql.plugin.hive.metastore.thrift.BridgingHiveMetastore.updatePartitionStatistics(BridgingHiveMetastore.java:128)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.updatePartitionStatistics(CachingHiveMetastore.java:393)
    at io.prestosql.plugin.hive.HiveMetastoreClosure.updatePartitionStatistics(HiveMetastoreClosure.java:99)
    at io.prestosql.plugin.hive.metastore.SemiTransactionalHiveMetastore.lambda$setPartitionStatistics$9(SemiTransactionalHiveMetastore.java:358)
    at com.google.common.collect.RegularImmutableMap.forEach(RegularImmutableMap.java:185)
    at io.prestosql.plugin.hive.metastore.SemiTransactionalHiveMetastore.lambda$setPartitionStatistics$10(SemiTransactionalHiveMetastore.java:357)
    at io.prestosql.plugin.hive.metastore.SemiTransactionalHiveMetastore.commit(SemiTransactionalHiveMetastore.java:969)
    at io.prestosql.plugin.hive.HiveMetadata.commit(HiveMetadata.java:2614)
    at io.prestosql.plugin.hive.HiveConnector.commit(HiveConnector.java:212)
    at io.prestosql.transaction.InMemoryTransactionManager$TransactionMetadata$ConnectorTransactionMetadata.commit(InMemoryTransactionManager.java:594)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
    at io.prestosql.$gen.Presto_20f5dc9____20210712_234035_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.thrift.transport.TTransportException: redactedhost:9083: java.net.SocketTimeoutException: Read timed out
    at io.prestosql.plugin.hive.metastore.thrift.Transport.rewriteException(Transport.java:96)
    at io.prestosql.plugin.hive.metastore.thrift.Transport$TTransportWrapper.readAll(Transport.java:173)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_update_partition_column_statistics(ThriftHiveMetastore.java:3874)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.update_partition_column_statistics(ThriftHiveMetastore.java:3861)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastoreClient.setPartitionColumnStatistics(ThriftHiveMetastoreClient.java:240)
    at io.prestosql.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.lambda$setPartitionColumnStatistics$17(FailureAwareThriftMetastoreClient.java:199)
    at io.prestosql.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.runWithHandle(FailureAwareThriftMetastoreClient.java:422)
    at io.prestosql.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.setPartitionColumnStatistics(FailureAwareThriftMetastoreClient.java:199)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.lambda$setPartitionColumnStatistics$28(ThriftHiveMetastore.java:640)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.setColumnStatistics(ThriftHiveMetastore.java:695)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.lambda$setPartitionColumnStatistics$29(ThriftHiveMetastore.java:636)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftMetastoreApiStats.lambda$wrap$0(ThriftMetastoreApiStats.java:42)
    at io.prestosql.plugin.hive.util.RetryDriver.run(RetryDriver.java:130)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.setPartitionColumnStatistics(ThriftHiveMetastore.java:635)
    ... 21 more
    Suppressed: org.apache.thrift.transport.TTransportException: redactedhost:9083: java.net.SocketTimeoutException: Read timed out
        ... 39 more
    Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
        at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
        at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
        at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
        at io.prestosql.plugin.hive.metastore.thrift.Transport$TTransportWrapper.readAll(Transport.java:170)
        ... 37 more
    Caused by: java.net.SocketTimeoutException: Read timed out
        at java.base/java.net.SocketInputStream.socketRead0(Native Method)
        at java.base/java.net.SocketInputStream.socketRead(Unknown Source)
        at java.base/java.net.SocketInputStream.read(Unknown Source)
        at java.base/java.net.SocketInputStream.read(Unknown Source)
        at java.base/java.io.BufferedInputStream.fill(Unknown Source)
        at java.base/java.io.BufferedInputStream.read1(Unknown Source)
        at java.base/java.io.BufferedInputStream.read(Unknown Source)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 45 more
    Suppressed: org.apache.thrift.transport.TTransportException: redactedhost:9083: java.net.SocketTimeoutException: Read timed out
        ... 39 more
    Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
        at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
        at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
        at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
        at io.prestosql.plugin.hive.metastore.thrift.Transport$TTransportWrapper.readAll(Transport.java:170)
        ... 37 more
    Caused by: java.net.SocketTimeoutException: Read timed out
        at java.base/java.net.SocketInputStream.socketRead0(Native Method)
        at java.base/java.net.SocketInputStream.socketRead(Unknown Source)
        at java.base/java.net.SocketInputStream.read(Unknown Source)
        at java.base/java.net.SocketInputStream.read(Unknown Source)
        at java.base/java.io.BufferedInputStream.fill(Unknown Source)
        at java.base/java.io.BufferedInputStream.read1(Unknown Source)
        at java.base/java.io.BufferedInputStream.read(Unknown Source)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 45 more
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
    at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
    at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
    at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
    at io.prestosql.plugin.hive.metastore.thrift.Transport$TTransportWrapper.readAll(Transport.java:170)
    ... 37 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.base/java.net.SocketInputStream.socketRead0(Native Method)
    at java.base/java.net.SocketInputStream.socketRead(Unknown Source)
    at java.base/java.net.SocketInputStream.read(Unknown Source)
    at java.base/java.net.SocketInputStream.read(Unknown Source)
    at java.base/java.io.BufferedInputStream.fill(Unknown Source)
    at java.base/java.io.BufferedInputStream.read1(Unknown Source)
    at java.base/java.io.BufferedInputStream.read(Unknown Source)
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
    ... 45 more

my hive catalog properties have:

hive.metastore.authentication.type=KERBEROS hive.metastore.thrift.impersonation.enabled=false hive.metastore.service.principal=hive/_HOST@THEDOMAINHERE hive.metastore.client.principal=myuser@THEDOMAINHERE hive.metastore.client.keytab=/etc/hive.keytab

looks related to https://github.com/trinodb/trino/issues/3624 I am using prestosql 336

ebyhr commented 2 years ago

Can you try hive.metastore-timeout? https://trino.io/docs/current/connector/hive.html#metastore-configuration-properties

tooptoop4 commented 2 years ago

@ebyhr after changing hive.metastore-timeout to 120s i get different error below:

io.prestosql.spi.PrestoException: Insert of object "org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics@3637c229" using statement "INSERT INTO `PART_COL_STATS` (`CS_ID`,`AVG_COL_LEN`,`BIT_VECTOR`,`CAT_NAME`,`COLUMN_NAME`,`COLUMN_TYPE`,`DB_NAME`,`BIG_DECIMAL_HIGH_VALUE`,`BIG_DECIMAL_LOW_VALUE`,`DOUBLE_HIGH_VALUE`,`DOUBLE_LOW_VALUE`,`ENGINE`,`LAST_ANALYZED`,`LONG_HIGH_VALUE`,`LONG_LOW_VALUE`,`MAX_COL_LEN`,`NUM_DISTINCTS`,`NUM_FALSES`,`NUM_NULLS`,`NUM_TRUES`,`PART_ID`,`PARTITION_NAME`,`TABLE_NAME`) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'ENGINE' cannot be null
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.setPartitionColumnStatistics(ThriftHiveMetastore.java:648)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.setPartitionColumnStatistics(ThriftHiveMetastore.java:625)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.updatePartitionStatistics(ThriftHiveMetastore.java:605)
    at io.prestosql.plugin.hive.metastore.thrift.BridgingHiveMetastore.updatePartitionStatistics(BridgingHiveMetastore.java:128)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.updatePartitionStatistics(CachingHiveMetastore.java:393)
    at io.prestosql.plugin.hive.HiveMetastoreClosure.updatePartitionStatistics(HiveMetastoreClosure.java:99)
    at io.prestosql.plugin.hive.metastore.SemiTransactionalHiveMetastore.lambda$setPartitionStatistics$9(SemiTransactionalHiveMetastore.java:358)
    at com.google.common.collect.RegularImmutableMap.forEach(RegularImmutableMap.java:185)
    at io.prestosql.plugin.hive.metastore.SemiTransactionalHiveMetastore.lambda$setPartitionStatistics$10(SemiTransactionalHiveMetastore.java:357)
    at io.prestosql.plugin.hive.metastore.SemiTransactionalHiveMetastore.commit(SemiTransactionalHiveMetastore.java:969)
    at io.prestosql.plugin.hive.HiveMetadata.commit(HiveMetadata.java:2614)
    at io.prestosql.plugin.hive.HiveConnector.commit(HiveConnector.java:212)
    at io.prestosql.transaction.InMemoryTransactionManager$TransactionMetadata$ConnectorTransactionMetadata.commit(InMemoryTransactionManager.java:594)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
    at io.prestosql.$gen.Presto_20f5dc9____20210714_120418_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Insert of object "org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics@3637c229" using statement "INSERT INTO `PART_COL_STATS` (`CS_ID`,`AVG_COL_LEN`,`BIT_VECTOR`,`CAT_NAME`,`COLUMN_NAME`,`COLUMN_TYPE`,`DB_NAME`,`BIG_DECIMAL_HIGH_VALUE`,`BIG_DECIMAL_LOW_VALUE`,`DOUBLE_HIGH_VALUE`,`DOUBLE_LOW_VALUE`,`ENGINE`,`LAST_ANALYZED`,`LONG_HIGH_VALUE`,`LONG_LOW_VALUE`,`MAX_COL_LEN`,`NUM_DISTINCTS`,`NUM_FALSES`,`NUM_NULLS`,`NUM_TRUES`,`PART_ID`,`PARTITION_NAME`,`TABLE_NAME`) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'ENGINE' cannot be null
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$update_partition_column_statistics_result$update_partition_column_statistics_resultStandardScheme.read(ThriftHiveMetastore.java)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$update_partition_column_statistics_result$update_partition_column_statistics_resultStandardScheme.read(ThriftHiveMetastore.java)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$update_partition_column_statistics_result.read(ThriftHiveMetastore.java)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_update_partition_column_statistics(ThriftHiveMetastore.java:3874)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.update_partition_column_statistics(ThriftHiveMetastore.java:3861)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastoreClient.setPartitionColumnStatistics(ThriftHiveMetastoreClient.java:240)
    at io.prestosql.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.lambda$setPartitionColumnStatistics$17(FailureAwareThriftMetastoreClient.java:199)
    at io.prestosql.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.runWithHandle(FailureAwareThriftMetastoreClient.java:422)
    at io.prestosql.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.setPartitionColumnStatistics(FailureAwareThriftMetastoreClient.java:199)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.lambda$setPartitionColumnStatistics$28(ThriftHiveMetastore.java:640)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.setColumnStatistics(ThriftHiveMetastore.java:695)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.lambda$setPartitionColumnStatistics$29(ThriftHiveMetastore.java:636)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftMetastoreApiStats.lambda$wrap$0(ThriftMetastoreApiStats.java:42)
    at io.prestosql.plugin.hive.util.RetryDriver.run(RetryDriver.java:130)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.setPartitionColumnStatistics(ThriftHiveMetastore.java:635)
    ... 21 more

my hive version is 3.1.3000.7.1.4.35-1 this looks related to https://github.com/trinodb/trino/issues/5155

aakashnand commented 2 years ago

@tooptoop4 I am also facing this issue with hive 3.1.3000.7.1.7. Were you able to resolve it

failed : Column 'ENGINE' cannot be null
tooptoop4 commented 2 years ago

still an issue

KevinAppelBofa commented 1 year ago

@tooptoop4 thanks for posting this led me to the timeout to get more info, then i found to turn off the column statistics

Added into the catalog file for the hive setup hive.metastore-timeout=60s hive.collect-column-statistics-on-write=false

This is now working: create table hivecatalog.demo.k1 as select * from hivecatalog.demo.test4a;

When the statistics are turned on, which is the default, i get this error message: Insert of object "org.apache.hadoop.hive.metastore.model.MTableColumnStatistics@1592aa30" using statement "INSERT INTO TAB_COL_STATS (CS_ID,AVG_COL_LEN,BIT_VECTOR,CAT_NAME,COLUMN_NAME,COLUMN_TYPE,DB_NAME,BIG_DECIMAL_HIGH_VALUE,BIG_DECIMAL_LOW_VALUE,DOUBLE_HIGH_VALUE,DOUBLE_LOW_VALUE,ENGINE,LAST_ANALYZED,LONG_HIGH_VALUE,LONG_LOW_VALUE,MAX_COL_LEN,NUM_DISTINCTS,NUM_FALSES,NUM_NULLS,NUM_TRUES,TBL_ID,TABLE_NAME) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'ENGINE' cannot be null

After the table is created, if i try to manually run the analyze table_name command, then I get the same error as what the create table produces

I'm not sure if this is a bug or working as expected?

aakashnand commented 1 year ago

@KevinAppelBofa I have found a workaround for this https://github.com/trinodb/trino-hive-apache/pull/43#issuecomment-1190060212

hashhar commented 3 months ago

This should actually be possible to resolve now if Hive vendors cherry-pick https://github.com/apache/hive/pull/4984.

Thanks @zhangbutao for following up on Hive side.