Metabase generated SQL with ORDER BY clause is not supported by Apache Hive.

novicecpp commented 5 years ago

Describe the bug For example, a table user with single column id, this SQL is not supported by hive:

SELECT `t1`.`id` AS `id` FROM `user` `t1` ORDER BY `t1`.`id` ASC;

Hive only support order by only selected field. This query works:

SELECT `t1`.`id` AS `id` FROM `user` `t1` ORDER BY `id` ASC;

The different is that table alias t1 must not present in field name of order by clause. Hive not allow to use table alias in order by clause when use column alias in select clause.

This query was generated when sort table by any field in UI.

Logs

09-24 10:20:31 WARN middleware.process-userland-query :: Query failure {:status :failed,
 :class org.apache.hive.service.cli.HiveSQLException,
 :error "Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 2:51 Invalid table alias or column reference 't1': (possible column names are: id)",
 :stacktrace
 ("org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)"
  "org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)"
  "org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)"
  "org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)"
  "org.apache.hive.jdbc.HivePreparedStatement.executeQuery(HivePreparedStatement.java:109)"
  "com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:431)"
  "clojure.java.jdbc$execute_query_with_params.invokeStatic(jdbc.clj:1072)"
  "clojure.java.jdbc$execute_query_with_params.invoke(jdbc.clj:1066)"
  "clojure.java.jdbc$db_query_with_resultset_STAR_.invokeStatic(jdbc.clj:1081)"
  "clojure.java.jdbc$db_query_with_resultset_STAR_.invoke(jdbc.clj:1075)"
  "clojure.java.jdbc$query.invokeStatic(jdbc.clj:1164)"
  "clojure.java.jdbc$query.invoke(jdbc.clj:1126)"
  "--> driver.hive_like$run_query.invokeStatic(hive_like.clj:126)"
  "driver.hive_like$run_query.invoke(hive_like.clj:116)"
  "driver.hive_like$run_query_without_timezone.invokeStatic(hive_like.clj:133)"
  "driver.hive_like$run_query_without_timezone.invoke(hive_like.clj:130)"
  "driver.sparksql$eval1043$fn__1045$fn__1047.invoke(sparksql.clj:140)"
  "driver.sql_jdbc.execute$do_with_try_catch.invokeStatic(execute.clj:217)"
  "driver.sql_jdbc.execute$do_with_try_catch.invoke(execute.clj:211)"
  "driver.sparksql$eval1043$fn__1045.invoke(sparksql.clj:137)"
  "query_processor$fn__43955$execute_query__43960$fn__43961.invoke(query_processor.clj:70)"
  "query_processor$fn__43955$execute_query__43960.invoke(query_processor.clj:64)"
  "query_processor.middleware.mbql_to_native$mbql__GT_native$fn__34096.invoke(mbql_to_native.clj:38)"
  "query_processor.middleware.annotate$result_rows_maps__GT_vectors$fn__36340.invoke(annotate.clj:540)"
  "query_processor.middleware.annotate$add_column_info$fn__36246.invoke(annotate.clj:484)"
  "query_processor.middleware.cumulative_aggregations$handle_cumulative_aggregations$fn__37267.invoke(cumulative_aggregations.clj:57)"
  "query_processor.middleware.resolve_joins$resolve_joins$fn__41065.invoke(resolve_joins.clj:184)"
  "query_processor.middleware.limit$limit$fn__38003.invoke(limit.clj:19)"
  "query_processor.middleware.results_metadata$record_and_return_metadata_BANG_$fn__43824.invoke(results_metadata.clj:87)"
  "query_processor.middleware.format_rows$format_rows$fn__37991.invoke(format_rows.clj:26)"
  "query_processor.middleware.add_dimension_projections$add_remapping$fn__34855.invoke(add_dimension_projections.clj:234)"
  "query_processor.middleware.add_source_metadata$add_source_metadata_for_source_queries$fn__35474.invoke(add_source_metadata.clj:107)"
  "query_processor.middleware.resolve_source_table$resolve_source_tables$fn__41115.invoke(resolve_source_table.clj:46)"
  "query_processor.middleware.add_row_count_and_status$add_row_count_and_status$fn__35334.invoke(add_row_count_and_status.clj:16)"
  "query_processor.middleware.driver_specific$process_query_in_context$fn__37478.invoke(driver_specific.clj:12)"
  "query_processor.middleware.resolve_driver$resolve_driver$fn__40729.invoke(resolve_driver.clj:22)"
  "query_processor.middleware.bind_effective_timezone$bind_effective_timezone$fn__36669$fn__36670.invoke(bind_effective_timezone.clj:9)"
  "util.date$call_with_effective_timezone.invokeStatic(date.clj:88)"
  "util.date$call_with_effective_timezone.invoke(date.clj:77)"
  "query_processor.middleware.bind_effective_timezone$bind_effective_timezone$fn__36669.invoke(bind_effective_timezone.clj:8)"
  "query_processor.middleware.store$initialize_store$fn__43849$fn__43850.invoke(store.clj:11)"
  "query_processor.store$do_with_store.invokeStatic(store.clj:46)"
  "query_processor.store$do_with_store.invoke(store.clj:40)"
  "query_processor.middleware.store$initialize_store$fn__43849.invoke(store.clj:10)"
  "query_processor.middleware.async$async__GT_sync$fn__34007.invoke(async.clj:23)"
  "query_processor.middleware.async_wait$runnable$fn__36397.invoke(async_wait.clj:89)"),
 :query
 {:type "query",
  :query {:source-table 1419, :order-by [["desc" ["field-id" 10260]]]},
  :parameters [],
  :async? true,
  :middleware {:add-default-userland-constraints? true, :userland-query? true},
  :info
  {:executed-by 1,
   :context :ad-hoc,
   :card-id nil,
   :nested? false,
   :query-hash [95, 33, -7, -110, -62, -56, -120, -123, -110, 96, 122, 101, 121, 91, 74, -104, -55, -26, -69, -51, 42, 113, -28, 51, 12, -105, 92, 114, 8, -42, -38, -8]},
  :constraints {:max-results 10000, :max-results-bare-rows 2000}},
 :preprocessed
 {:constraints {:max-results 10000, :max-results-bare-rows 2000},
  :type :query,
  :middleware {:add-default-userland-constraints? true, :userland-query? true},
  :info
  {:executed-by 1,
   :context :ad-hoc,
   :nested? false,
   :query-hash [95, 33, -7, -110, -62, -56, -120, -123, -110, 96, 122, 101, 121, 91, 74, -104, -55, -26, -69, -51, 42, 113, -28, 51, 12, -105, 92, 114, 8, -42, -38, -8]},
  :preprocessing-level 1,
  :database 8,
  :driver :sparksql,
  :query {:source-table 1419, :order-by [[:desc [:field-id 10260]]], :fields [[:field-id 10260]], :limit 2000}},
 :native {:query "SELECT `t1`.`id` AS `id` FROM `test` `t1` ORDER BY `t1`.`id` DESC LIMIT 2000", :params nil},
 :cause
 {:class org.apache.hive.service.cli.HiveSQLException,
  :error "Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 2:51 Invalid table alias or column reference 't1': (possible column names are: id)",
  :cause {:class java.lang.RuntimeException, :error "org.apache.hadoop.hive.ql.parse.SemanticException:Line 2:51 Invalid table alias or column reference 't1': (possible column names are: id)"}}}

To Reproduce

Prepare hive table and create connection to database using "Spark SQL" type
Navigate to table view of some table
Try doing ascending/descending sort of any field in that table
See error

Expected behavior A SQL generated must not include table alias as a prefix

Information about your Metabase Installation:

Your browser and the version: Chrome dev-channel 78.0.3904.17
Your operating system: Mac OS Mojave 10.14.6
Your databases: MySQL 5.7
Metabase version: v0.33.2 Built on 2019-09-05 (Docker image tags: v0.33.2)
Metabase hosting environment: Docker on Ubuntu 18.04
Metabase internal database: Hive 2.3.5 (AWS EMR)

salsakran commented 5 years ago

Prioritizing this as p3 since we don't officially support Hive. It would be good to look into it regardless however, but we should prioritize the Spark driver supporting Spark idiomatically.

sbelak commented 4 years ago

Will probably be fixed by #13345

chillinger commented 3 years ago

This issue still prevails in V0.39.2 and was not solved by #13345.

@salsakran could you maybe shed some light on what exactly is supported under the point "SparkSQL" mentioned under "Officially supported Databases" in https://www.metabase.com/docs/latest/administration-guide/01-managing-databases.html?

flamber commented 3 years ago

@chillinger #13345 was never merged. A fix for this would likely require updating the driver #11023, which would require more work than just upgrading the driver.

camsaul commented 2 years ago

@chillinger @salsakran there's a little bit of confusion here. We only support Spark built with Hive and JDBC support. Pre-built Spark binaries come with this built in. Spark implements the Hive SQL API, and we use the Hive JDBC driver for connecting to Spark clusters.

So I'd characterize what we support as "Spark+Hive" or "Spark via Hive" specifically. We don't support Hive on its own or with other things besides Spark

camsaul commented 2 years ago

After looking at this I don't think this has anything to do with the JDBC driver version. We need to omit the table aliases for Fields in the order by clause which would be reasonably easy to do but has nothing to do with the JDBC driver version

NicolasPA commented 9 months ago

This is still an issue with Metabase v0.48.6 and Hive 3.1.3.

This is a blocker because we can't use date based aggregations in "Question" style queries, as it generates an ORDER BY. Question is what makes Metabase attractive in the first place.

As said above, the workaround is simply to turn the question into an SQL query and remove the table alias in the ORDER BY close. It seems to be a reasonably challenging change to support a plethora of potential Hive users in big companies still using Hadoop.

This issue is seriously compromising the adoption of Metabase Pro I was pushing for in my team. We'll probably have to fallback to Superset.

metabase / metabase

Metabase generated SQL with ORDER BY clause is not supported by Apache Hive. #10973