trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.36k stars 2.98k forks source link

Support Hive managed Transactional/ACID tables #576

Closed anirbanch closed 4 years ago

anirbanch commented 5 years ago

Presto fails to run queries against Hive managed tables. In Hive 3.x these tables are defaulted to transactional/insert-only as per https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/hive-overview/content/hive_upgrade_changes.html . Given that this is the default table type in Hive 3.x for all hive managed table, can Presto be enhanced to handle them?

Any one upgrading to Hive 3.x will have diminished use for Presto without this support, as Presto can only handle ‘external’ tables in Hive 3.x

Error Message, when trying to access a ‘transactions/insert-only’ table:

Query 20190402_173707_00215_qkyph failed: Your client does not appear to support insert-only tables.
               To skip capability checks, please set metastore.client.capability.check to false. This setting can be
               set globally, or on the client for the current metastore session. Note that this may lead to incorrect results,
               data loss, undefined behavior, etc. if your client is actually incompatible. You can also specify custom client
               capabilities via get_table_req API.
        at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.getTable(ThriftHiveMetastore.java:252)
        at io.prestosql.plugin.hive.metastore.thrift.BridgingHiveMetastore.getTable(BridgingHiveMetastore.java:81)
        at io.prestosql.plugin.hive.metastore.CachingHiveMetastore.loadTable(CachingHiveMetastore.java:274)
        at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:165)
        at com.google.common.cache.CacheLoader$1.load(CacheLoader.java:188)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
shubhamtagra commented 5 years ago

Proposal for supporting Insert Only Transactional Hive tables: https://docs.google.com/document/d/1GwSNc_6jSP8N1SyJIHnaMeWvh7JzhGPZ7QKm1DHJqEw/edit?usp=sharing

@electrum @martint @dain please review. I have a wip implementation of this which is able to read Insert Only Transactional tables from Hive3

dain commented 5 years ago

@electrum what do you think about create a high level issue to track this stuff? I would say the proposal above is part of the Hive 3.0 support project, and a sub item of supporting new table organizations. The proposal only covers reading of insert only tables and not write or full acid tables.

electrum commented 5 years ago

I read the proposal and it seems reasonable. One question is that it mentions deltas, which to my understanding are only for full ACID, not insert-only.

From an implementation perspective, table layouts are deprecated. I’m working on a PR to remove them from the Hive connector. I suggest waiting for that, which should be complete and merged sometime next week, then send a PR on top. I’m happy to review and work with you on it.

shubhamtagra commented 5 years ago

@electrum I remembered there were discussion around removing table layouts but saw them in master hence went ahead with it, we can wait till your changes land and replace table layout dependency with whatever gets used to fetch partitions. About the deltas for insert-only tables, they are created for every insert. Difference with full ACID tables is that for deletes in full ACID tables delete_deltas are additionally created.

Jedda1314Jessie commented 5 years ago

@electrum I have a question whether presto will provide the support for ACID of Hive in the future.Because when I used Hive to develop the data warehouse of my company, I found that some dim tables(scd) and some fact table(accumulator snapshot table) need to update.For that, I opened the transaction support of Hive but it made presto work wrong for the reason that Presto can't read HIVE's tables which support row update or delete.

shubhamtagra commented 5 years ago

@Jedda1314Jessie we are adding the read support for ACID tables. Review has been opened for the first part to support reading INSERT ONLY ACID tables. I will be adding docs and reviews for full ACID support too soon.

shubhamtagra commented 5 years ago

Proposal for Full ACID table reads: https://docs.google.com/document/d/1VrF48kqr_paTtF5iSwryRhZvMwL_ovtbjcAedQd7hyk/edit?usp=sharing

I have this implemented and it works fine. There are some performance optimizations possible over this which can be picked once we finalize this solution.

@dain @electrum @martint please have a look.

heocaf commented 5 years ago

@stagraqubole are there any latest update or schedule? thanks!

shubhamtagra commented 5 years ago

@heocaf This is in progress. The PRs are out and under review. I cant comment on the timelines but atleast the PR for Insert Only Tables can go in fast once @electrum does the final review. Full ACID Tables PR is just started.

heocaf commented 5 years ago

@stagraqubole got it, thanks for your reply! Looks like I need to roll back to an earlier version of HDP :(

findepi commented 4 years ago

Presto 331 (pending release) can read from Hive transactional/ORC ACID tables.

The outstanding things are:

All the above is tracked by the Hive 3 umbrella issue https://github.com/prestosql/presto/issues/1218

MohammedLayeeq commented 4 years ago

I am using Presto version 0.232, and Hive version 3.1.0. Is this integration supports managed tables of Hive3.1.0 Transactional/ACID tables. I am also facing same error Query 20200320_145603_00018_rsdg9 failed: Your client does not appear to support insert-only tables. To skip capability checks, please set metastore.client.capa bility.check to false. This setting can be set globally, or on the client for th e current metastore session. Note that this may lead to incorrect results, data loss, undefined behavior, etc. if your client is actually incompatible. You can also specify custom client capabilities via get_table_req API.

findepi commented 4 years ago

@MohammedLayeeq please try with Presto 331 -- https://prestosql.io/download.html