Closed anirbanch closed 4 years ago
Proposal for supporting Insert Only Transactional Hive tables: https://docs.google.com/document/d/1GwSNc_6jSP8N1SyJIHnaMeWvh7JzhGPZ7QKm1DHJqEw/edit?usp=sharing
@electrum @martint @dain please review. I have a wip implementation of this which is able to read Insert Only Transactional tables from Hive3
@electrum what do you think about create a high level issue to track this stuff? I would say the proposal above is part of the Hive 3.0 support project, and a sub item of supporting new table organizations. The proposal only covers reading of insert only tables and not write or full acid tables.
I read the proposal and it seems reasonable. One question is that it mentions deltas, which to my understanding are only for full ACID, not insert-only.
From an implementation perspective, table layouts are deprecated. I’m working on a PR to remove them from the Hive connector. I suggest waiting for that, which should be complete and merged sometime next week, then send a PR on top. I’m happy to review and work with you on it.
@electrum I remembered there were discussion around removing table layouts but saw them in master hence went ahead with it, we can wait till your changes land and replace table layout dependency with whatever gets used to fetch partitions. About the deltas for insert-only tables, they are created for every insert. Difference with full ACID tables is that for deletes in full ACID tables delete_deltas are additionally created.
@electrum I have a question whether presto will provide the support for ACID of Hive in the future.Because when I used Hive to develop the data warehouse of my company, I found that some dim tables(scd) and some fact table(accumulator snapshot table) need to update.For that, I opened the transaction support of Hive but it made presto work wrong for the reason that Presto can't read HIVE's tables which support row update or delete.
@Jedda1314Jessie we are adding the read support for ACID tables. Review has been opened for the first part to support reading INSERT ONLY ACID tables. I will be adding docs and reviews for full ACID support too soon.
Proposal for Full ACID table reads: https://docs.google.com/document/d/1VrF48kqr_paTtF5iSwryRhZvMwL_ovtbjcAedQd7hyk/edit?usp=sharing
I have this implemented and it works fine. There are some performance optimizations possible over this which can be picked once we finalize this solution.
@dain @electrum @martint please have a look.
@stagraqubole are there any latest update or schedule? thanks!
@heocaf This is in progress. The PRs are out and under review. I cant comment on the timelines but atleast the PR for Insert Only Tables can go in fast once @electrum does the final review. Full ACID Tables PR is just started.
@stagraqubole got it, thanks for your reply! Looks like I need to roll back to an earlier version of HDP :(
Presto 331 (pending release) can read from Hive transactional/ORC ACID tables.
The outstanding things are:
All the above is tracked by the Hive 3 umbrella issue https://github.com/prestosql/presto/issues/1218
I am using Presto version 0.232, and Hive version 3.1.0. Is this integration supports managed tables of Hive3.1.0 Transactional/ACID tables. I am also facing same error Query 20200320_145603_00018_rsdg9 failed: Your client does not appear to support insert-only tables. To skip capability checks, please set metastore.client.capa bility.check to false. This setting can be set globally, or on the client for th e current metastore session. Note that this may lead to incorrect results, data loss, undefined behavior, etc. if your client is actually incompatible. You can also specify custom client capabilities via get_table_req API.
@MohammedLayeeq please try with Presto 331 -- https://prestosql.io/download.html
Presto fails to run queries against Hive managed tables. In Hive 3.x these tables are defaulted to transactional/insert-only as per https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/hive-overview/content/hive_upgrade_changes.html . Given that this is the default table type in Hive 3.x for all hive managed table, can Presto be enhanced to handle them?
Any one upgrading to Hive 3.x will have diminished use for Presto without this support, as Presto can only handle ‘external’ tables in Hive 3.x
Error Message, when trying to access a ‘transactions/insert-only’ table: