hortonworks-spark / spark-llap

Apache License 2.0
102 stars 68 forks source link

Failed to select columns which name is a reserved keyword #203

Closed killerwhile closed 6 years ago

killerwhile commented 6 years ago

Hi there, Spark LLAP seem to fail to query columns which name is a reserved keyword.

Versions:

Steps to reproduce:

Using in Beeline, I'm creating a table using a mix of normal and reserved keywords as column names:

create table test_llap_reserved_keywords (`user` string, `values` string, `attr1` string, `attr2` string);
insert into test_llap_reserved_keywords (`user`, `values`, `attr1`, `attr2`) values ('u1', 'v1', 'a11', 'a21'), ('u2', 'v2', 'a12', 'a22');
select * from test_llap_reserved_keywords;

These SQL statements work well in Beeline.

In spark-shell with llap enabled, I'm querying this table.

SPARK_MAJOR_VERSION=2 spark-shell --packages com.hortonworks.spark:spark-llap-assembly_2.11:1.1.3-2.1 --conf spark.sql.hive.llap=true

// selecting all columns is not working, see error below
sql("select * from test_llap_reserved_keywords").show()

// selecting specific, with-reserved-keyword column is not working
sql("select `user` as u from test_llap_reserved_keywords").show()
sql("select `values` as v from test_llap_reserved_keywords").show()

// selecting columns witout reserved-keywork is working
sql("select `attr1` as a1, `attr2` as a2 from test_llap_reserved_keywords").show()

In any failures listed above, I'm getting the following exception (of course predicate name vary accordingly to the column selected):

scala> sql("select `user` as u from test_llap_reserved_keywords").show()
java.io.IOException: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to compile query: org.apache.hadoop.hive.ql.parse.ParseException: line 1:7 Failed to recognize predicate 'user'. Failed rule: 'identifier' in table or column identifier
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:230)
  at org.apache.hadoop.hive.llap.LlapRowInputFormat.getSplits(LlapRowInputFormat.java:45)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.getPartitions(HadoopRDD.scala:405)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:314)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withNewExecutionId(Dataset.scala:2788)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2128)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2127)
  at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2818)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2127)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2342)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:638)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:597)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:606)
  ... 48 elided
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to compile query: org.apache.hadoop.hive.ql.parse.ParseException: line 1:7 Failed to recognize predicate 'user'. Failed rule: 'identifier' in table or column identifier
  at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:255)
  at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:241)
  at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:365)
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:222)
  ... 90 more
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to compile query: org.apache.hadoop.hive.ql.parse.ParseException: line 1:7 Failed to recognize predicate 'user'. Failed rule: 'identifier' in table or column identifier
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:487)
  at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:311)
  at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:856)
  at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:552)
  at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:715)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
  at org.apache.thrift.server.TServlet.doPost(TServlet.java:83)
  at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:206)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
  at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565)
  at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:479)
  at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
  at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031)
  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406)
  at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
  at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965)
  at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
  at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
  at org.eclipse.jetty.server.Server.handle(Server.java:349)
  at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:449)
  at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:925)
  at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
  at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
  at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:76)
  at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:609)
  at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:45)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to compile query: org.apache.hadoop.hive.ql.parse.ParseException: line 1:7 Failed to recognize predicate 'user'. Failed rule: 'identifier' in table or column identifier
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:164)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1932)
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482)
  ... 32 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to compile query: org.apache.hadoop.hive.ql.parse.ParseException: line 1:7 Failed to recognize predicate 'user'. Failed rule: 'identifier' in table or column identifier
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:231)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:187)
  at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:955)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:903)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:894)
  at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:955)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:903)
  at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:136)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:442)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:434)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
  ... 34 more

Spark without LLAP works well, including the select of all columns:

SPARK_MAJOR_VERSION=2 spark-shell
scala> sql("select * from test_llap_reserved_keywords").show()
+----+------+-----+-----+
|user|values|attr1|attr2|
+----+------+-----+-----+
|  u1|    v1|  a11|  a21|
|  u2|    v2|  a12|  a22|
+----+------+-----+-----+
EricWohlstadter commented 6 years ago

@dongjoon-hyun @jdere

I will fix this for 2.3-3.0 but my patch won't apply correctly to existing code (too much refactoring).

dongjoon-hyun commented 6 years ago

Thank you for reporting, @killerwhile .

@EricWohlstadter . For HDP 2.6.2, SPARK-LLAP is still Technical Preview. Since we will replace the master branch completely, I think it's okay you have a test case and fix on your branch (Spark 2.3 / Hive 3.0 / Hadoop 3.0) for now.

cc @HyukjinKwon .

killerwhile commented 6 years ago

Do you think a backport to HDP 2.6.5 would be an option, since this version will include Spark 2.3 as well?

dongjoon-hyun commented 6 years ago

Unfortunately, for HDP 2.6.5, it's already too late for backporting although I don't know the release date.

killerwhile commented 6 years ago

Ok, fair enough. Thanks

dongjoon-hyun commented 6 years ago

Thank you for understanding, @killerwhile .

EricWohlstadter commented 6 years ago

This is fixed in master, for Spark 2.3, Hive 3. Please open a separate ticket if wanting to request backporting to older versions.