trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.94k stars 2.87k forks source link

Errors while reading from Hive ACID Tables #5569

Open srinikvv opened 3 years ago

srinikvv commented 3 years ago

Issue on prestosql343 with Hive ACID tables on Hive 3.1.0

Prestosql intermittently throws below errors while reading from Hive ORC ACID (transaction='true', 'transactional_properties'='default', orc.compress='ZLIB')tables: USER Error:

SQL Error [13]: Query failed (#20201016_065200_00104_hi9gk): Hive transactional tables are supported with Hive 3.0 and only after a major compaction has been run
io.prestosql.spi.PrestoException: Hive transactional tables are supported with Hive 3.0 and only after a major compaction has been run
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:475)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:320)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:249)
    at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
    at io.prestosql.$gen.Presto_343____20201015_183710_2.run(Unknown Source)
    at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

Internal Error:

SQL Error [16777223]: Query failed (#20201016_063152_00100_hi9gk): ORC ACID file should have 6 columns: hdfs://XXXX:8020/warehouse/tablespace/managed/hive/base.db/dashboard_test/base_0000001/bucket_00011
io.prestosql.spi.PrestoException: ORC ACID file should have 6 columns: hdfs://XXXX:8020/warehouse/tablespace/managed/hive/base.db/dashboard_test/base_0000001/bucket_00011
    at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.verifyAcidSchema(OrcPageSourceFactory.java:423)
    at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.createOrcPageSource(OrcPageSourceFactory.java:250)
    at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.createPageSource(OrcPageSourceFactory.java:162)
    at io.prestosql.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:171)
    at io.prestosql.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:101)
    at io.prestosql.spi.connector.ConnectorPageSourceProvider.createPageSource(ConnectorPageSourceProvider.java:68)
    at io.prestosql.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:66)
    at io.prestosql.split.PageSourceManager.createPageSource(PageSourceManager.java:64)
    at io.prestosql.operator.TableScanOperator.getOutput(TableScanOperator.java:298)
    at io.prestosql.operator.Driver.processInternal(Driver.java:379)
    at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
    at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
    at io.prestosql.operator.Driver.processFor(Driver.java:276)
    at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
    at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
    at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
    at io.prestosql.$gen.Presto_343____20201015_183807_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
findepi commented 3 years ago

Can you share output of hive --version or equivalent?

srinikvv commented 3 years ago
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive 3.1.0.3.1.4.0-315
Git git://ctr-e139-1542663976389-113618-01-000003.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive -r e8d79f440455fa4400daf79974666b3055f1730f
Compiled by jenkins on Fri Aug 23 05:16:38 UTC 2019
From source with checksum 0321d07fd607c216351462c714d08b6a
findepi commented 3 years ago

Can you try with https://github.com/prestosql/presto/pull/5570 ?

srinikvv commented 3 years ago

@findepi we are creating a docker image of presto-sql and runing on K8. Currently using presto binaries from https://repo1.maven.org/maven2/io/prestosql/presto-server/343/presto-server-343.tar.gz Can you generate a similar tar file with the above changes for me to test?

findepi commented 3 years ago

@srinikvv just to be clear: i am not trying to fix the issue, i am trying to get some more information about the problem.

to get the server tgz you would need to clone my repo, check out the PR branch (check the branch name at the top of the PR page) and run the maven build

# just this
./mvnw clean install -DskipTests

# OR this, if you run into problems eg building docs:
./mvnw -pl '!presto-server-rpm,!presto-docs,!presto-proxy,!presto-verifier,!presto-benchto-benchmarks' clean install -DskipTests -Dair.check.skip-all=true -Dmaven.javadoc.skip=true
daun4168 commented 2 years ago

I encountered the same problem. (Hive table is flat table, which created at hive 2.x )