itsjafer / jupyterlab-sparkmonitor

JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
https://krishnan-r.github.io/sparkmonitor/
Apache License 2.0
92 stars 23 forks source link

Error creating PySpark Job #19

Open uday1889 opened 4 years ago

uday1889 commented 4 years ago

I'm trying to start a PySpark job on an AWS EMR instance with all other configuration parameters set to default. I've also installed the jupyterlab-sparkmonitor extension. Note: The spark cluster setup works fine otherwise

config = {'spark.extraListeners': 'sparkmonitor.listener.JupyterSparkMonitorListener',
                'spark.driver.extraClassPath': 'some/path/miniconda3/envs/py_env/lib/python3.7/site-packages/sparkmonitor/listener.jar'}
config = SparkConf().setAll([(param, value) for param, value in config.items()])

spark_session = SparkSession \
         .builder.master("yarn") \
         .config(conf = config) \
        .appName(appName) \
        .getOrCreate()

This is the error that results:

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Exception when registering SparkListener
    at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2398)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: sparkmonitor.listener.JupyterSparkMonitorListener
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
    at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2682)
    at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2680)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
    at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2680)
    at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2387)
    at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2386)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2386)

Should I be referencing the Listener class differently? I'm not entirely sure how to work towards a fix here. Could you please help me investigate this? Thanks in advance!

dfdf commented 4 years ago

Same problem.