h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.93k stars 2k forks source link

Test Hive 2.1.0 JDBC Driver #12687

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

We should add basic tests for Hive 2.1.0 JDBC Driver.

A [stream|https://groups.google.com/forum/#!topic/h2ostream/SMxZPlLdRmI] user reported a parsing issue (appearing to be related to OFFSET) when using HDP 2.6.5, Hive version is 2.1.0. and Hive JDBC 2.1.0 to import a Hive table from Flow.

Note: Hive did not support OFFSET for version [<2.0.0|https://issues.apache.org/jira/browse/HIVE-11531].

For convenience posting original stream question:

{code} When I try to import hive table in h2o flow, I got some error about EOF.

- https://github.com/h2oai/h2o-tutorials/blob/master/tutorials/hive_jdbc_driver/Hive.md - h2o version : h2o_3.20.0.5 - hive jdbc: 2.1.0 - hadoop core : 1.2.1 - OS : centos version is Apache Hive 2.1.0 DistributedException from localhost/127.0.0.1:54321: 'SQLException: Error while compiling statement: FAILED: ParseException line 1:38 missing EOF at 'OFFSET' near '2692' Failed to read SQL data', caused by java.lang.RuntimeException: SQLException: Error while compiling statement: FAILED: ParseException line 1:38 missing EOF at 'OFFSET' near '2692' Failed to read SQL data at water.MRTask.getResult(MRTask.java:478) at water.MRTask.getResult(MRTask.java:486) at water.MRTask.doAll(MRTask.java:390) at water.MRTask.doAll(MRTask.java:377) at water.jdbc.SQLManager$1.compute2(SQLManager.java:196) at water.H2O$H2OCountedCompleter.compute(H2O.java:1267) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) Caused by: java.lang.RuntimeException: SQLException: Error while compiling statement: FAILED: ParseException line 1:38 missing EOF at 'OFFSET' near '2692' Failed to read SQL data at water.jdbc.SQLManager$SqlTableToH2OFrame.map(SQLManager.java:345) at water.MRTask.compute2(MRTask.java:657) at water.H2O$H2OCountedCompleter.compute1(H2O.java:1270) at water.jdbc.SQLManager$SqlTableToH2OFrame$Icer.compute1(SQLManager$SqlTableToH2OFrame$Icer.java) at water.H2O$H2OCountedCompleter.compute(H2O.java:1266) ... 5 more more details Can you use beeline to successfully run the following SQL statement `SELECT * FROM [xyz_table] LIMIT [A] offset [B]`? -> sure, it works well in beeline What is the underlying storage of your table of interest in hive (i.e. is it a parquet file or ORC file) -> Both files are used Note: if the underlying storage is either Parquet or ORC you can import these files via HDFS - you don’t need to use the Hive v2 JDBC driver. -> It works well and can be used temporarily. But there is no way to import part of table. If you have any idea, please let me know. It is really helpful to me :) {code}
exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: some users are keen on using sql import with Hive 2.1.0, that doesn't support {{LIMIT max OFFSET start}} syntax, but {{LIMIT start, max}} instead. [~accountid:557058:5bcbac08-75cf-4c6b-b4d2-294f7c0fe9b8], [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6], is it something we're willing to add/support?

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: [~accountid:5b153fb1b0d76456f36daced], [~accountid:557058:eac185dd-5a5c-46e9-bb5a-13217ee9c218], please clarify - the ticket talks about Hive 1.2.1 and Hive JDBC version 2.1.0 - is the ticket about Hive1 or Hive2 or both?

To answer - we are not targetting support for Hive 1.x right now. We should, however, support Hive2 and any version of Hive 2.x.

exalate-issue-sync[bot] commented 1 year ago

Lauren DiPerna commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] sorry that was a typo it's suppose to be 2.1.0, this ticket is only about Hive2, I've updated the ticket with the correct version.

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: [~accountid:557058:eac185dd-5a5c-46e9-bb5a-13217ee9c218], thanks for clarifying. Looks like a bug then, we should support Hive 2.1 as well.

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] I guess I can mark this one as resolved by https://0xdata.atlassian.net/browse/PUBDEV-5927, right?

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: [~accountid:5b153fb1b0d76456f36daced] I wouldn't resolve this by PUBDEV-5927 - Hive should be supported in a distributed ingest mode, not just in streaming ingest.

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-5835 Assignee: UNASSIGNED Reporter: Lauren DiPerna State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A