Open ltregan opened 1 year ago
Hi @ltregan thanks for opening an issue. I'm looking into this today
@ltregan I'm unable to reproduce. Can you check if you are still running into issues?
Still same issue, even after clearing the cache. Exact sequence is:
$ docker system prune -a -f
$ git clone https://github.com/jupyter-incubator/sparkmagic sparkmagic-dev
$ cd sparkmagic-dev
$ docker compose up
I am on Mac M1. Something fishy also is that CPU start at 20% (can be seen in the screenshots at the bottom) then goes up to 40% after a couple of minutes and stay there.
Full log then screenshots below.
sh-5.1# ../bin/pyspark
Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
23/03/15 18:45:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Exception in thread "Thread-4" java.lang.ExceptionInInitializerError
at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0
at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
... 10 more
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.7
/_/
Using Python version 3.7.11 (default, Jul 27 2021 14:32:16)
SparkSession available as 'spark'.
>>>
@ltregan Thanks for the screenshots. I'm able to reproduce
Describe the bug I believe there was a new push of the image by datamechanics (5 days ago ?) and now sparkmagic docker image does not work anymore. If you log to the spark-1 container, and try ../bin/pyspark I get this error:
To Reproduce git clone https://github.com/jupyter-incubator/sparkmagic sparkmagic-dev cd sparkmagic-dev docker compose up
then create a new PySpark notebook and a simple command does not. work. eg. %data = [(1, 'John', 'Doe')]
Expected behavior PySpark kernel should work
Screenshots If applicable, add screenshots to help explain your problem.
Versions:
Additional context I believe there was a new push of the image by datamechanics (5 days ago ?)