trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.36k stars 2.98k forks source link

Cannot read hive text/csv table (between 314->328/329) #2678

Closed 360d-io-labs closed 4 years ago

360d-io-labs commented 4 years ago

After upgrading the cluster to 328/329 from 314, it is impossible to query CSV table stored on HDFS with Hive 3.1.0

Here is the exception raised:

presto:ks> select * from elli_csv_presto limit 10;
Query 20200130_102416_00052_s3b7w failed: java.lang.UnsupportedOperationException: Storage schema reading not supported
io.prestosql.spi.PrestoException: java.lang.UnsupportedOperationException: Storage schema reading not supported
        at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.getFields(ThriftHiveMetastore.java:415)
        at io.prestosql.plugin.hive.metastore.thrift.BridgingHiveMetastore.lambda$getTable$0(BridgingHiveMetastore.java:89)
        at java.util.Optional.map(Optional.java:215)
        at io.prestosql.plugin.hive.metastore.thrift.BridgingHiveMetastore.getTable(BridgingHiveMetastore.java:87)
        at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.loadTable(CachingHiveMetastore.java:307)
        at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:165)
        at com.google.common.cache.CacheLoader$1.load(CacheLoader.java:188)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3952)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
        at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)
        at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.get(CachingHiveMetastore.java:244)
        at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.getTable(CachingHiveMetastore.java:296)
        at io.prestosql.plugin.hive.HiveMetastoreClosure.getTable(HiveMetastoreClosure.java:71)
        at io.prestosql.plugin.hive.metastore.SemiTransactionalHiveMetastore.getTable(SemiTransactionalHiveMetastore.java:184)
11:50 AM
    at io.prestosql.plugin.hive.HiveMetadata.getView(HiveMetadata.java:1714)
    at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.getView(ClassLoaderSafeConnectorMetadata.java:442)
    at io.prestosql.metadata.MetadataManager.getView(MetadataManager.java:955)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:949)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:287)
    at io.prestosql.sql.tree.Table.accept(Table.java:53)
    at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:302)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.analyzeFrom(StatementAnalyzer.java:2204)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:1161)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:287)
    at io.prestosql.sql.tree.QuerySpecification.accept(QuerySpecification.java:144)
    at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:302)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:312)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:853)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:287)
    at io.prestosql.sql.tree.Query.accept(Query.java:107)
    at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:302)
    at io.prestosql.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:279)
    at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:83)
    at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:75)
    at io.prestosql.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:219)
    at io.prestosql.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:178)
    at io.prestosql.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:95)
    at io.prestosql.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:712)
    at io.prestosql.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:119)
    at io.prestosql.$gen.Presto_328____20200129_142811_2.call(Unknown Source)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    at io.prestosql.plugin.hive.HiveMetadata.getView(HiveMetadata.java:1714)
    at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.getView(ClassLoaderSafeConnectorMetadata.java:442)
    at io.prestosql.metadata.MetadataManager.getView(MetadataManager.java:955)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:949)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:287)
    at io.prestosql.sql.tree.Table.accept(Table.java:53)
    at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:302)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.analyzeFrom(StatementAnalyzer.java:2204)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:1161)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:287)
    at io.prestosql.sql.tree.QuerySpecification.accept(QuerySpecification.java:144)
    at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:302)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:312)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:853)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:287)
    at io.prestosql.sql.tree.Query.accept(Query.java:107)
    at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:302)
    at io.prestosql.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:279)
    at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:83)
    at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:75)
    at io.prestosql.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:219)
    at io.prestosql.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:178)
    at io.prestosql.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:95)
    at io.prestosql.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:712)
    at io.prestosql.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:119)
    at io.prestosql.$gen.Presto_328____20200129_142811_2.call(Unknown Source)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: java.lang.UnsupportedOperationException: Storage schema reading not supported
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_fields_result$get_fields_resultStandardScheme.read(ThriftHiveMetastore.java:47792)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_fields_result$get_fields_resultStandardScheme.read(ThriftHiveMetastore.java:47759)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_fields_result.read(ThriftHiveMetastore.java:47685)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_fields(ThriftHiveMetastore.java:1357)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_fields(ThriftHiveMetastore.java:1343)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at io.prestosql.plugin.base.util.LoggingInvocationHandler.handleInvocation(LoggingInvocationHandler.java:60)
        at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:86)
        at com.sun.proxy.$Proxy197.get_fields(Unknown Source)
        at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastoreClient.getFields(ThriftHiveMetastoreClient.java:188)
        at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.lambda$getFields$12(ThriftHiveMetastore.java:407)
        at io.prestosql.plugin.hive.metastore.thrift.ThriftMetastoreApiStats.lambda$wrap$0(ThriftMetastoreApiStats.java:42)
        at io.prestosql.plugin.hive.util.RetryDriver.run(RetryDriver.java:130)
        at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.getFields(ThriftHiveMetastore.java:405)
        ... 53 more

I have no problem to do the same query with Presto 314 (the previous version installed on this cluster).

I tried to create an external table with Presto on the same data and the table is successfully created in Hive. However if I tried to query this presto created table I also get this exception.

presto:ks> CREATE TABLE hive.ks.elli_presto (
            ->        value varchar, key varchar)
            -> WITH (
            ->     csv_escape = '\',
            ->     csv_quote = '"',
            ->     csv_separator = ',',
            ->     external_location = 'hdfs://cluster/projects/ks/elli_c',
            ->     format = 'CSV');

from Hive (show create table):

+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE elli_presto(       |
|   value string COMMENT 'from deserializer',      |
|   key string COMMENT 'from deserializer')        |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.serde2.OpenCSVSerde'     |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.mapred.TextInputFormat'       |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION                                           |
|   'hdfs://cluster/projects/ks/elli_c' |
| TBLPROPERTIES (                                    |
|   'STATS_GENERATED_VIA_STATS_TASK'='workaround for potential lack of HIVE-12730',  |
|   'escapeChar'='\',                               |
|   'presto_query_id'='20200130_102400_00050_s3b7w',  |
|   'presto_version'='328',                          |
|   'quoteChar'='"',                                 |
|   'separatorChar'=',',                             |
|   'transient_lastDdlTime'='1580379840')            |
+----------------------------------------------------+

I have no problem to access all the ORC tables.

ebyhr commented 4 years ago

Could you share the SHOW CREATE TABLE result for elli_csv_presto on both Hive and Presto?

360d-io-labs commented 4 years ago

I cannot do a SHOW CREATE TABLE from Presto (that's why I added the vision from Hive) for all text/csv tables (either created with Hive or Presto). It is generating the same exception. Though I can try to do it from Presto-314 if you want.

findepi commented 4 years ago

@360d-io-labs this is backwards-incompatibility issue on the metastore side.

The reason why it broke in your environment is that we added first-class (and correct) support for CSV tables in 316. Before that version you could have all sorts of problems when CSV table was declared with types different than varchar (hive allows that).

Can you add this to your Hive metastore configuration?

<property>
    <!-- https://community.hortonworks.com/content/supportkb/247055/errorjavalangunsupportedoperationexception-storage.html -->
    <name>metastore.storage.schema.reader.impl</name>
    <value>org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader</value>
</property>

https://github.com/prestosql/docker-images/blob/4aadfb19cf5a3d7a2576f4cd670c9ced21debf53/prestodev/hdp3.1-hive/files/etc/hive/conf/hive-site.xml#L67-L71

findepi commented 4 years ago

(i added a note to https://github.com/prestosql/presto/issues/1841 too)

360d-io-labs commented 4 years ago

@findepi Thanks it works! I think that this should be added to the documentation regarding hive connector to make it clear as this is not the default configuration in HDP and other people may face the same problem (same for the default ACID, which let Presto write to the table but not read).

findepi commented 4 years ago

For anyone finding this issue, the solution is documented here:

https://prestosql.io/docs/current/connector/hive.html#metastore-configuration-for-avro-and-csv

noodlemind commented 3 years ago

I am having the similar issue with access / describing the tables created from a CSV File that is accessed using s3a.

Even after the table being created successfully, I get the localhost:9083 null error. I do not want to create it using Hive.

Is there a solution / workaround for this issue ?

findepi commented 3 years ago

@noodlemind you can reach out for help on the #troubleshooting channel on our slack