krishnan-r / sparkmonitor

Monitor Apache Spark from Jupyter Notebook
https://krishnan-r.github.io/sparkmonitor/
Apache License 2.0
172 stars 55 forks source link

How to solve "[IPKernelApp] ERROR | No such comm target registered: SparkMonitor"? #6

Open tracegyx opened 6 years ago

tracegyx commented 6 years ago

I installed the "sparkmonitor" follow the document, but there is an issue just like "[IPKernelApp] ERROR | No such comm target registered: SparkMonitor" when I start notebook. So how to solve it? Thanks very much!

krishnan-r commented 6 years ago

Are you seeing that error in the jupyter console? It may be because the ipython kernel extension is not configured correctly. The JS frontend is unable to start a comm channel with the kernel extension, possibly because the kernel extension has not started.

There are 3 components to this extension, A jupyter frontend extension in JavaScript, an IPython Kernel extension, and a Jupyter Webserver extension.

Please also check your Jupyter Version, it has to be 5.x for the Jupyter comm APIs to work.

tracegyx commented 6 years ago

Yeah, I seeing that error in the jupyter console. And my jupyter version is 4.4.0, ipython version is 5.5.0.

I'm a newbie, and I don't konw how to solve this issue. Can you tell me some details? Thanks!

krishnan-r commented 6 years ago

Can you try pip install --upgrade jupyter or conda update jupyter if you are using Anaconda Check the jupyter version then and let me know if you are still getting the error after.

tracegyx commented 6 years ago

I try pip install --upgrade jupyter , the ipython version change to 5.6.0, but there is no improvement. image

AbdealiLoKo commented 6 years ago

I am using: notebook (5.5.0) jupyter-core (4.4.0) jupyter (1.0.0) jupyter-client (5.2.3) jupyter-console (5.2.0)

I am getting this same issue. I ran the steps mentioned in https://krishnan-r.github.io/sparkmonitor/install.html

[I 01:32:59.531 NotebookApp] [nb_conda_kernels] enabled, 4 kernels found
[W 01:33:00.009 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 01:33:00.048 NotebookApp] JupyterLab beta preview extension loaded from /opt/miniconda3/lib/python3.6/site-packages/jupyterlab
[I 01:33:00.048 NotebookApp] JupyterLab application directory is /opt/miniconda3/share/jupyter/lab
[I 01:33:00.429 NotebookApp] sparkmagic extension enabled!
SPARKMONITOR_SERVER: Loading Server Extension
[I 01:33:00.462 NotebookApp] Serving contents
[I 01:33:00.462 NotebookApp] 0 active kernels
[I 01:33:00.462 NotebookApp] The Jupyter Notebook is running at:
[I 01:33:00.462 NotebookApp] http://ip-172-31-32-137:9990/
[I 01:33:00.462 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 01:33:08.733 NotebookApp] Notebook Payment file MOB.ipynb is not trusted
[I 01:33:09.851 NotebookApp] Kernel started: a29ca01a-0a15-40e8-bb23-0272b3546cca
[I 01:33:10.343 NotebookApp] Adapting to protocol v5.1 for kernel a29ca01a-0a15-40e8-bb23-0272b3546cca
[IPKernelApp] ERROR | No such comm target registered: SparkMonitor
Ivy Default Cache set to: /home/hadoop/.ivy2/cache
The jars for the packages stored in: /home/hadoop/.ivy2/jars
:: loading settings :: url = jar:file:/usr/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
... spark logs ...

Im using Amazon AWS

krishnan-r commented 6 years ago

Yes this issue seems to be real. I am getting this problem now. I think it may be some problem in the kernel side. Some breaking change with the Jupyter protocol perhaps. The communication mechanism between the frontend and the kernel seems to be broken. I will look into this over the weekend and get back to you.

7 is also due to the same problem I think.

krishnan-r commented 6 years ago

I have identified the issue. I am quite embarrassed that this has propagated for this long.

The instructions provided here in the quick install are outdated. The quick install instructions in the readme are correct.

This kernel extension line is not supposed to be: ipython profile create && echo "c.InteractiveShellApp.extensions.append('sparkmonitor')" >> $(ipython profile locate default)/ipython_kernel_config.py should be changed to ipython profile create && echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >> $(ipython profile locate default)/ipython_kernel_config.py

Initially the kernel extension lived in the package namespace of sparkmonitor I made a few cleaning up changes to move the kernel extension to sparkmonitor.kernelextension.

But I forgot to update the install instructions in the website and the dockerfile, but only the readme was updated.

Please confirm if this solves the problem. If the kernel extension is successfully loaded, a SparkConf object named conf will be declared in the global namespace.

AbdealiLoKo commented 6 years ago

I changed this, but it still does not show any dashboard to monitor the spark job.

[I 14:22:43.756 NotebookApp] [nb_conda_kernels] enabled, 4 kernels found
[I 14:22:44.357 NotebookApp] JupyterLab beta preview extension loaded from /Users/abdealijk/anaconda3/lib/python3.6/site-packages/jupyterlab
[I 14:22:44.357 NotebookApp] JupyterLab application directory is /Users/abdealijk/anaconda3/share/jupyter/lab
[I 14:22:44.963 NotebookApp] [nb_anacondacloud] enabled
[I 14:22:44.966 NotebookApp] [nb_conda] enabled
[I 14:22:45.008 NotebookApp] ✓ nbpresent HTML export ENABLED
[W 14:22:45.008 NotebookApp] ✗ nbpresent PDF export DISABLED: No module named 'nbbrowserpdf'
SPARKMONITOR_SERVER: Loading Server Extension
[I 14:22:45.054 NotebookApp] Serving notebooks from local directory: /Users/abdealijk/Documents/notebooks
[I 14:22:45.054 NotebookApp] 0 active kernels
[I 14:22:45.054 NotebookApp] The Jupyter Notebook is running at:
[I 14:22:45.054 NotebookApp] http://localhost:8888/?token=1d964f2e5ef2c3cd77e9bce03de7b3d502b63f216ff9002e
[I 14:22:45.054 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 14:22:45.055 NotebookApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=1d964f2e5ef2c3cd77e9bce03de7b3d502b63f216ff9002e
[W 14:22:45.095 NotebookApp] 404 GET /spark/api/v1/applications (::1) 38.17ms referer=http://localhost:8888/notebooks/payment_data.ipynb
[W 14:22:45.100 NotebookApp] 404 GET /spark/api/v1/applications (::1) 2.06ms referer=http://localhost:8888/notebooks/payment_data.ipynb
[I 14:22:45.207 NotebookApp] Accepting one-time-token-authenticated connection from ::1
[W 14:22:46.164 NotebookApp] 404 GET /spark/api/v1/applications (::1) 1.23ms referer=http://localhost:8888/notebooks/payment_data.ipynb
[W 14:22:47.165 NotebookApp] 404 GET /spark/api/v1/applications (::1) 1.24ms referer=http://localhost:8888/notebooks/payment_data.ipynb
[W 14:22:48.164 NotebookApp] 404 GET /spark/api/v1/applications (::1) 1.16ms referer=http://localhost:8888/notebooks/payment_data.ipynb
[W 14:22:49.165 NotebookApp] 404 GET /spark/api/v1/applications (::1) 1.18ms referer=http://localhost:8888/notebooks/payment_data.ipynb
[I 14:22:49.264 NotebookApp] Kernel started: 4b08f08a-5837-4a7d-9f12-4fa6a71b64f6
[W 14:22:49.336 NotebookApp] 404 GET /nbextensions/nbpresent/css/nbpresent.min.css (::1) 2.89ms referer=http://localhost:8888/notebooks/payment_data.ipynb
[I 14:22:50.823 NotebookApp] Adapting to protocol v5.1 for kernel 4b08f08a-5837-4a7d-9f12-4fa6a71b64f6
[W 14:22:51.577 NotebookApp] 404 GET /spark/api/v1/applications (::1) 1.27ms referer=http://localhost:8888/notebooks/payment_data.ipynb
2018-05-27 14:22:56 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[Stage 1:>                                                          (0 + 1) / 1]

sparkmonitor_serverextension.log is empty.

sparkmonitor_kernelextension.log has:

INFO:  2018-05-27 14:22:50,353 - sparkmonitorkernel - 20016 - MainProcess -         140735835722624 - MainThread
 Starting Kernel Extension

INFO:  2018-05-27 14:22:50,355 - sparkmonitorkernel - 20016 - MainProcess -         140735835722624 - MainThread
 Socket Listening on port 64179

INFO:  2018-05-27 14:22:50,356 - sparkmonitorkernel - 20016 - MainProcess -         123145498189824 - Thread-4
 Starting socket thread, going to accept

INFO:  2018-05-27 14:22:50,829 - sparkmonitorkernel - 20016 - MainProcess -         140735835722624 - MainThread
 COMM OPENED MESSAGE:
 {'header': {'msg_id': 'a5d96602bae54adfba5c0cabea24e6b6', 'username': 'username', 'session': 'e3ff30ecc21e4f9e8382e6893954efa7', 'msg_type': 'comm_open', 'version': '5.2', 'date': datetime.datetime(2018, 5, 27, 8, 52, 50, 827796, tzinfo=tzutc())}, 'msg_id': 'a5d96602bae54adfba5c0cabea24e6b6', 'msg_type': 'comm_open', 'parent_header': {}, 'metadata': {}, 'content': {'comm_id': '6d4db53c53704fa78149951700a2bc87', 'target_name': 'SparkMonitor', 'data': {'msgtype': 'openfromfrontend'}}, 'buffers': []}
krishnan-r commented 6 years ago

Based on your output, I think the kernel extension, server-extension and front-end-extension have loaded.

May I know what kind of notebook you are running. Have you tried running this example.

Can you give me the output of the conf object created by the extension. print(conf.toDebugString()) #Instance of SparkConf with options set by the extension I am guessing this may be because the custom SparkListener which collects the data may not have registered with Spark.

What version of Spark are you running?

AbdealiLoKo commented 6 years ago

I just did a csv.read and count()

import findspark
findspark.init()
import pyspark

spark = pyspark.sql.SparkSession.builder \
    .appName("My Pyspark Session") \
    .getOrCreate()

df = spark.read.csv("s3a://bucket/prefix/file.csv")

df.count()

I can see the "Toggle sparkmonitor display" button on the top. I get NameError: name 'conf' is not defined when I run print(conf.toDebugString())

when I restarted my kernel I think i got the following in the logs from jupyter:

[I 16:14:13.467 NotebookApp] Starting buffering for ee4b52a4-080d-4ad1-a970-50a4ad8a5a4d:24dcf4598f074cf39c34d10259058346
[I 16:14:18.578 NotebookApp] Kernel restarted: ee4b52a4-080d-4ad1-a970-50a4ad8a5a4d
[I 16:14:20.352 NotebookApp] Adapting to protocol v5.1 for kernel ee4b52a4-080d-4ad1-a970-50a4ad8a5a4d
[I 16:14:20.353 NotebookApp] Restoring connection for ee4b52a4-080d-4ad1-a970-50a4ad8a5a4d:24dcf4598f074cf39c34d10259058346
[I 16:14:20.353 NotebookApp] Replaying 6 buffered messages
[IPKernelApp] WARNING | No such comm: c38a9cfe40c2476fb4279ae9d90b1ba3

There seems to be a warning from IPKernelApp.

krishnan-r commented 6 years ago

When the SparkMonitor kernel-extension starts it creates a SparkConf object configured with a parameter for a custom SparkListener class that instructs spark to send data to an instance of this class. This class written in scala forwards the monitoring data to the ipython kernel extension, which is further forwarded to the javascript frontend for rendering.

For this to work, the conf object must be passed to the SparkSession/SparkContext, only then will the data collection happen.

Now in your case, you seem to be using the findspark module. Currently inside sparkmonitor kernelextension something like this is happening:

try:
    from pyspark import SparkConf
except ImportError:
    spark_imported = False

and if pyspark is imported this happens

if spark_imported:
        # Get conf if user already has a conf for appending
        conf = ipython.user_ns.get("conf")
        if conf:
            logger.info("Conf: " + conf.toDebugString())
            if isinstance(conf, SparkConf):
                configure(conf)
        else:
            conf = SparkConf()  # Create a new conf
            configure(conf)
            ipython.push({"conf": conf})  # Add to users namespace

My guess is that the kernel extension is failing due pyspark import failing. Can you try with spark manually configured....instead of using findspark.

Integration with findspark could be a nice to have feature.

The warning you are getting when restarting the kernel is not an issue, I can confirm that I have the warning with the extension working.

AbdealiLoKo commented 6 years ago

Works, got a few errors and issues for which I have made new github issues. Created a PR for findspark at https://github.com/krishnan-r/sparkmonitor/pull/11

krishnan-r commented 6 years ago

Closing this for now. If anyone else encounters this problem, feel free to open an issue.

JulienPeloton commented 6 years ago

OS: macOS (10.13.5) sparkmonitor: 0.0.9 python: 3.6.4 jupyter: 4.4.0 ipython: 6.4.0

Hi, Thanks for the nice package. I recently installed sparkmonitor via pip, and get into trouble while using pyspark directly. Reading comments above, I'm not sure this has been addressed.

If I launch my notebooks using:

/path/to/jupyter notebook <args>

and importing pyspark inside, sparkmonitor works. However if I use pyspark instead:

PYSPARK_DRIVER_PYTHON_OPTS="/path/to/jupyter notebook" pyspark --py-files ...

the sparkmonitor is not displayed and I get the following warning:

[IPKernelApp] WARNING | No such comm: 65c8bde216794811ac29bfa7996796d7

Any ideas? Thanks! Julien

krishnan-r commented 6 years ago

That error is because the ipython kernel extension is not loaded. I think the way jupyter is started, the kernel configuration is not read. I will try to reproduce this issue.

The sparkmonitor ipython kernel extension is configured to start in ipython_kernel_config.py in the users home directory. I think pyspark might have something different. This works when you use findspark. I haven't tried the above method.

JulienPeloton commented 6 years ago

Thanks for looking into it. So far I couldn't find a solution :-( I might go for the solution described above and avoid the use of PYSPARK_DRIVER_PYTHON_OPTS...

chaoyue729 commented 6 years ago

[IPKernelApp] WARNING | No such comm: 13ab45b693e542368311fca97aa2f7a8 it can print the spark_conf but without "spark.extraListeners=sparkmonitor.listener.JupyterSparkMonitorListener spark.driver.extraClassPath=/home/admin/miniconda3/lib/python3.6/site-packages/sparkmonitor/listener.jar" so the extension does not show the spark job

krishnan-r commented 6 years ago

@chaoyue729 can you explain further?

If you can print the SparkConf object the IPython kernel extension has started, the problem is with the frontend extension Can you check the browser console for errors? (sparkmonitor at the moment only works with the Jupyter Notebook interface.)

Are you extending the same conf object and starting spark with it? Also please ensure the minimum jupyter version here.

STHITAPRAJNAS commented 5 years ago

I tried all the above options and I meet all the requirements . I still get "[IPKernelApp] ERROR | No such comm target registered: SparkMonitor".

I tried pyspark invoked from PYSPARK_DRIVER_PYTHON_OPTS as well as by just jupyter notebook. I also have tried with and without findspark.

I didnt get any errors during installation. Instructions look simple but hard to make it to work.

randomthought commented 5 years ago

@AbdealiJK are you trying to run the extension using jupyterlab? I got the same error when trying to run it using jupyterlab. But I am able to run it on jupyter notebook just fine. The are some plans to support jupyterlab https://github.com/krishnan-r/sparkmonitor/issues/1

AbdealiLoKo commented 5 years ago

I don't remember honestly - it was a long time ago

The logs from my comments above say: JupyterLab beta preview extension loaded

So I'm assuming I was running lab