Closed dockerhub-publics closed 5 years ago
I believe you are missing the spark-hive
jar.
Try adding to your image something like:
ADD https://repo1.maven.org/maven2/org/apache/spark/spark-hive_2.11/2.4.1/spark-hive_2.11-2.4.1.jar /usr/hadoop-3.0.0/share/hadoop/common/lib/
@dockerhub-publics closing, because this is not a pytest-spark
related issue
I have put content of my test_sql_query_automation.py exactly as in the malexer's test/test_spark_session_fixture.py here in master branch.
To easily reproduce this you may want to use the same Docker Hub image that I use: danimages/spark-pytests
And here what I get:
$ pytest --spark_home=$SPARK_HOME -s -vv test_sql_query_automation.py ============================= test session starts ============================== platform linux -- Python 3.5.3, pytest-5.0.1, py-1.8.0, pluggy-0.12.0 -- /usr/bin/python3 cachedir: .pytest_cache spark version -- Spark 2.4.1 built for Hadoop 2.6.5 | Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pkafka-0-8 -Pflume -Phadoop-provided -DzincPort=3038 rootdir: /builds/ber/Aufbau_BI_Platform, inifile: pytest.ini plugins: spark-0.5.2 collecting ... collected 2 items
test_sql_query_automation.py::test_spark_session_dataframe 2019-07-09 17:26:16,073 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). ERROR test_sql_query_automation.py::test_spark_session_sql ERROR
==================================== ERRORS ==================================== ____ ERROR at setup of test_spark_session_dataframe ____
a = ('xro49', <py4j.java_gateway.GatewayClient object at 0x7f4e88700ba8>, 'o47', 'sessionState') kw = {} s = "java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':" stackTrace = 'org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1107...nd.java:79)\n\t at py4j.GatewayConnection.run(GatewayConnection.java:238)\n\t at java.lang.Thread.run(Thread.java:748)'
/usr/spark-2.4.1/python/pyspark/sql/utils.py:63:
answer = 'xro49' gateway_client = <py4j.java_gateway.GatewayClient object at 0x7f4e88700ba8> target_id = 'o47', name = 'sessionState'
/usr/spark-2.4.1/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py:328: Py4JJavaError
During handling of the above exception, another exception occurred:
/usr/local/lib/python3.5/dist-packages/pytest_spark/fixtures.py:28:
/usr/spark-2.4.1/python/pyspark/sql/session.py:183: in getOrCreate session._jsparkSession.sessionState().conf().setConfString(key, value) /usr/spark-2.4.1/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:1257: in call answer, self.gateway_client, self.target_id, self.name)
a = ('xro49', <py4j.java_gateway.GatewayClient object at 0x7f4e88700ba8>, 'o47', 'sessionState') kw = {} s = "java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':" stackTrace = 'org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1107...nd.java:79)\n\t at py4j.GatewayConnection.run(GatewayConnection.java:238)\n\t at java.lang.Thread.run(Thread.java:748)'
/usr/spark-2.4.1/python/pyspark/sql/utils.py:79: IllegalArgumentException ___ ERROR at setup of test_spark_session_sql ___
a = ('xro49', <py4j.java_gateway.GatewayClient object at 0x7f4e88700ba8>, 'o47', 'sessionState') kw = {} s = "java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':" stackTrace = 'org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1107...nd.java:79)\n\t at py4j.GatewayConnection.run(GatewayConnection.java:238)\n\t at java.lang.Thread.run(Thread.java:748)'
/usr/spark-2.4.1/python/pyspark/sql/utils.py:63:
answer = 'xro49' gateway_client = <py4j.java_gateway.GatewayClient object at 0x7f4e88700ba8> target_id = 'o47', name = 'sessionState'
/usr/spark-2.4.1/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py:328: Py4JJavaError
During handling of the above exception, another exception occurred:
/usr/local/lib/python3.5/dist-packages/pytest_spark/fixtures.py:28:
/usr/spark-2.4.1/python/pyspark/sql/session.py:183: in getOrCreate session._jsparkSession.sessionState().conf().setConfString(key, value) /usr/spark-2.4.1/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:1257: in call answer, self.gateway_client, self.target_id, self.name)
a = ('xro49', <py4j.java_gateway.GatewayClient object at 0x7f4e88700ba8>, 'o47', 'sessionState') kw = {} s = "java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':" stackTrace = 'org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1107...nd.java:79)\n\t at py4j.GatewayConnection.run(GatewayConnection.java:238)\n\t at java.lang.Thread.run(Thread.java:748)'
/usr/spark-2.4.1/python/pyspark/sql/utils.py:79: IllegalArgumentException =============================== warnings summary =============================== /usr/spark-2.4.1/python/pyspark/cloudpickle.py:47 /usr/spark-2.4.1/python/pyspark/cloudpickle.py:47: PendingDeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp
-- Docs: https://docs.pytest.org/en/latest/warnings.html ===================== 1 warnings, 2 error in 4.82 seconds ======================