holdenk / spark-testing-base

Base classes to use when writing tests with Spark
Apache License 2.0
1.52k stars 358 forks source link

Hortonworks Hive Warehouse Connector & Hadoop 3 #317

Open geoHeil opened 4 years ago

geoHeil commented 4 years ago

When adding the hortonworks Hive Warehouse Connector into a spark project where spark-testing-base is used for unit tests, any unit test will fail with:

Unrecognized Hadoop major version number: 3.1.1.3.1.5.9-1

How can this be fixed?

Note, I am using spark 2.3.x on a Hadoop 3.1 installation with the latest 0.14.0 version of spark-testing-base.

The hortonworks hive warehouse connector to a gradle scala (spark) project like:

repositories {
            maven { url "https://repo.hortonworks.com/content/repositories/releases/" }
        }
    compile "com.hortonworks.hive:hive-warehouse-connector_2_11:1.0.0.3.1.5.9-1"
geoHeil commented 4 years ago

Here is a full stack trace:

*** RUN ABORTED *** (2 seconds, 108 milliseconds)
  java.util.concurrent.ExecutionException: java.lang.ExceptionInInitializerError
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  at org.scalatest.tools.ConcurrentDistributor.waitUntilDone(ConcurrentDistributor.scala:50)
  at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1334)
  at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1012)
  at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)
  at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1509)
  at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1011)
  at org.scalatest.tools.Runner$.main(Runner.scala:827)
  at org.scalatest.tools.Runner.main(Runner.scala)
  ...
  Cause: java.lang.ExceptionInInitializerError:
  at com.holdenkarau.spark.testing.DataFrameSuiteBaseLike$class.newBuilder$1(DataFrameSuiteBase.scala:84)
  at com.holdenkarau.spark.testing.DataFrameSuiteBaseLike$class.sqlBeforeAllTestCases(DataFrameSuiteBase.scala:114)
  at com.corp.MyTest.com$holdenkarau$spark$testing$DataFrameSuiteBase$$super$sqlBeforeAllTestCases(PercentileServiceTest.scala:9)
  at com.holdenkarau.spark.testing.DataFrameSuiteBase$class.beforeAll(DataFrameSuiteBase.scala:43)
  at com.corp.MyTest.beforeAll(PercentileServiceTest.scala:9)
  at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:212)
  at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
  at com.corp.MyTest.run(PercentileServiceTest.scala:9)
  at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  ...
  Cause: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.1.3.1.5.9-1
  at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
  at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
  at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
  at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
  at com.holdenkarau.spark.testing.DataFrameSuiteBaseLike$class.newBuilder$1(DataFrameSuiteBase.scala:84)
  at com.holdenkarau.spark.testing.DataFrameSuiteBaseLike$class.sqlBeforeAllTestCases(DataFrameSuiteBase.scala:114)
  at com.corp.MyTest.com$holdenkarau$spark$testing$DataFrameSuiteBase$$super$sqlBeforeAllTestCases(PercentileServiceTest.scala:9)
  at com.holdenkarau.spark.testing.DataFrameSuiteBase$class.beforeAll(DataFrameSuiteBase.scala:43)
  at com.corp.MyTest.beforeAll(PercentileServiceTest.scala:9)
  at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:212)

Probably finding the right excludes will get the job done.

geoHeil commented 4 years ago

potentially simply excluding https://stackoverflow.com/questions/53915059/how-can-i-fix-java-lang-illegalargumentexception-unrecognized-hadoop-major-vers i.e. hive-exec will get the job done.

edit

this trick is not getting the job done