krishnan-r / sparkmonitor

Monitor Apache Spark from Jupyter Notebook
https://krishnan-r.github.io/sparkmonitor/
Apache License 2.0
172 stars 55 forks source link

Python 3 Kernel Issue #13

Closed krishnan-r closed 5 years ago

krishnan-r commented 5 years ago

Hello I have added a python3 kernel to my jupyter docker image Is it a way to have sparkmonitor working with both 2.x and 3.x ?

It works fine with a python 2 kernel, but when i switch to 3.x kernel, the conf test raise an error


print(conf.toDebugString())

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-0a5e403cf2b8> in <module>
----> 1 print(conf.toDebugString())

NameError: name 'conf' is not defined

Thanks for your help

Originally posted by @Ftagn92 in https://github.com/krishnan-r/sparkmonitor/issues/1#issuecomment-460721425

krishnan-r commented 5 years ago

@Ftagn92 it looks like the kernel extension has not been configured for the python 3 kernel, the conf is a variable injected by the sparkmonitor's kernel extension to configure spark to route monitoring data to the kernel

In the instructions to install this line does that, perhaps your environment is different,

ipython profile create && echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >> $(ipython profile locate default)/ipython_kernel_config.py

You can also load the extension manually i think using %load_ext magic in IPython

Ftagn92 commented 5 years ago

Hello krishnan, Thanks for your answer

I have started with your dockerfile, adding 3 python3 lines

FROM krishnanr/docker-jupyter-spark
ADD ./extension/ /extension/
ADD ./notebooks/ /notebooks/
RUN pip install sparkmonitor
#my 3 lines
RUN python3 -m pip install ipykernel
RUN python3 -m ipykernel install --user
RUN python3 -m pip install sparkmonitor
#end of my 3 lines
RUN jupyter nbextension install sparkmonitor --py --user --symlink && \
jupyter nbextension enable sparkmonitor --py --user && \
jupyter serverextension enable --py --user sparkmonitor && \
ipython profile create && \
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

WORKDIR /notebooks/

EXPOSE 8888

CMD jupyter notebook --port=8888 --ip=0.0.0.0 --no-browser --allow-root --NotebookApp.token=''

it added the python3 kernel correctly, but not the extension May be i have to add an instruction in dockerfile to enable the extension ? i tried %load_ext sparkmonitor in python3 kernel, but it failed with

%load_ext sparkmonitor

The sparkmonitor module is not an IPython extension.
krishnan-r commented 5 years ago

Well, i am using python 3 in my desktop now, using anaconda, it does work with python 3, let me try out the docker file you have mentioned.

it is actually %load_ext sparkmonitor.kernelextension. I tried it, it works, but there is another bug when using %load_ext, You have to reload the browser once before starting the spark application, because it is the browser that starts the comm api channel, and does it when a new kernel is started, (this is missed when %load_ext is called, have to fix this.)

Ftagn92 commented 5 years ago

Hello krishnan,

I tried this Starting a new notebook with python2 kernel

print(conf.toDebugString()) 
spark.driver.extraClassPath=/usr/local/lib/python2.7/dist-packages/sparkmonitor/listener.jar
spark.extraListeners=sparkmonitor.listener.JupyterSparkMonitorListener

switch to python3 kernel (menu kernel / change kernel / python3)

print(conf.toDebugString()) 
NameError: name 'conf' is not defined
%load_ext sparkmonitor.kernelextension
The sparkmonitor.kernelextension extension is already loaded. To reload it, use:
  %reload_ext sparkmonitor.kernelextension
%reload_ext sparkmonitor.kernelextension
(no text output)
print(conf.toDebugString()) 
NameError: name 'conf' is not defined

Tried F5, ctrl+f5, Close and halt, reload and many combinations, same result :/

Thanks for your help

krishnan-r commented 5 years ago

Hmm, that seems weird, definitely has some bug to fix, Ill try to fix this, I am working on a major rewrite of the JS part right now.

krishnan-r commented 5 years ago

On second thought, can you check the browser logs, Jupyter terminal logs, as well as the extension is supposed to output a couple of log files on the same folder as the notebook, kernel_extension.log

Are you able to run spark on python 3? The kernel extension seems to load, if spark is not in importable path, then the extension doesn't add the conf...or its throwing some other exception

Ftagn92 commented 5 years ago

Hello Krishnan, Thanks for your last message, you are right, the problem was the ability to find Spark with the Python3 kernel

I have added a fourth line in my dockerfile

RUN python3 -m pip install findspark

Now sparkmonitor works for both kernels if i switch inside a notebook

Thanks for your help