holdenk / spark-testing-base

Base classes to use when writing tests with Spark
Apache License 2.0
1.53k stars 357 forks source link

Building Spark session fails with Hadoop 3 because of Hive Shims #321

Open mikitakandratsiuk opened 4 years ago

mikitakandratsiuk commented 4 years ago

Using version 2.4.5_0.14.0 There is an issue during creation of Spark Session object. As can be seen here, Spark Session has enableHiveSupport by default. This calls the org.spark-project.hive:hive-exec:1.2.1.spark2 library (specifically Hive Shims) which is not compatible with Hadoop 3 and causes "Unrecognized Hadoop major version number" error.

This makes spark-testing-base unusable with Hadoop 3 (especially when Hive is not required for the project at all).

The stack trace is below:

An exception or error caused a run to abort. 
java.lang.ExceptionInInitializerError
    at com.holdenkarau.spark.testing.DataFrameSuiteBaseLike$class.newBuilder$1(DataFrameSuiteBase.scala:84)
    at com.holdenkarau.spark.testing.DataFrameSuiteBaseLike$class.sqlBeforeAllTestCases(DataFrameSuiteBase.scala:114)
    at xxx.xxx.xxx.xxx.spark_kafka.ApplicationTests.com$holdenkarau$spark$testing$DataFrameSuiteBase$$super$sqlBeforeAllTestCases(ApplicationTests.scala:14)
    at com.holdenkarau.spark.testing.DataFrameSuiteBase$class.beforeAll(DataFrameSuiteBase.scala:43)
    at xxx.xxx.xxx.xxx.spark_kafka.ApplicationTests.beforeAll(ApplicationTests.scala:14)
    at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:212)
    at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
    at xxx.xxx.xxx.xxx.spark_kafka.ApplicationTests.run(ApplicationTests.scala:14)
    at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
    at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1320)
    at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1314)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
    at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:972)
    at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:971)
    at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
    at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
    at org.scalatest.tools.Runner$.run(Runner.scala:798)
    at org.scalatest.tools.Runner.run(Runner.scala)
    at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:133)
    at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:27)
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.2
    at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
    at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
    at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
    at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
    ... 21 more
mavericksid commented 3 years ago

@mikitakandratsiuk were you able to find any solution for this?

mikitakandratsiuk commented 3 years ago

@mavericksid, it's been a long time since I've raised this issue, so I don't really remember. Here is the comment that I found in my code, hope it helps:

I've excluded complete org.spark-project.hive from Holdenkarau and substituted it with org.apache.hive:hive-exec:3.1.2 and org.apache.hive:hive-metastore:3.1.2.

build.sbt - libraryDependencies:

"com.holdenkarau" %% "spark-testing-base" % s"${sparkVersion}_0.14.0" % Test
   // exclude Hive (especially Hive Shims) because of "IllegalArgumentException: Unrecognized Hadoop major version number: 3.2.1" error (add real Hive dependency below instead)
   // the reason is that Hive dependency is added by Spark-Hive 2.4.5, where Hive doesn't support Hadoop 3
   excludeAll ExclusionRule("org.spark-project.hive")

"org.apache.hive" % "hive-metastore" % "3.1.2" % Test,
"org.apache.hive" % "hive-exec" % "3.1.2" % Test,