jupyter / help

:sparkles: Need some help or have some questions? Please visit our Discourse page.
291 stars 97 forks source link

Can't start kernel #531

Closed keflavich closed 5 years ago

keflavich commented 5 years ago

When I try to run a jupyter notebook kernel, the whole jupyter notebook server freezes and becomes unresponsive (i.e., I cannot close it via ctrl-c, but must use kill -9 to end the process).

The failure is happening on kernel initialization; this command also freezes:

python -m ipykernel_launcher -f /users/aginsbur/.local/share/jupyter/runtime/kernel-4f062402-0d42-44f6-93d0-e042073c46e9.json

The original command I run is:

jupyter notebook --no-browser --port=8888 --debug

and the last text I see on the terminal before it becomes unresponsive is:

[D 14:21:59.085 NotebookApp] Starting kernel: ['/lustre/naasc/users/aginsbur/anaconda/bin/python', '-m', 'ipykernel_launcher', '-f', '/users/aginsbur/.local/share/jupyter/runtime/kernel-64ad7173-0a73-47e9-ab6d-936732075944.json']
[D 14:21:59.090 NotebookApp] Connecting to: tcp://

Any idea what could be causing this, or how I should debug it?

hickst commented 5 years ago

Interesting....I just took the following steps:

conda create -n jl python=3.6
source activate jl
conda install -c conda-forge jupyterlab
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter lab --no-browser --port 8888 --debug

The Lab environment starts up on port 8888, I am able to start a new notebook but not a new console: the console window just hangs. The messages on the startup window look like this:

[D 15:36:29.862 LabApp] Starting kernel: ['/usr/local/src/anaconda_5.2/envs/jl/bin/python', '-m', 'ipykernel_launcher', '-f', '/Users/hickst/Library/Jupyter/runtime/kernel-f2902c0c-0365-451c-8c2a-1df6b61607b6.json']
[D 15:36:29.870 LabApp] Connecting to: tcp://
[D 15:36:29.871 LabApp] Connecting to: tcp://
[I 15:36:29.873 LabApp] Kernel started: f2902c0c-0365-451c-8c2a-1df6b61607b6
[D 15:36:29.874 LabApp] Kernel args: {'kernel_name': 'python3', 'cwd': '/Users/hickst/temp/testjl'}
[D 15:36:29.875 LabApp] 201 POST /api/sessions?1551825389841 (::1) 30.32ms
[D 15:36:30.352 LabApp] Accepting token-authenticated connection from ::1
[D 15:36:30.353 LabApp] 200 GET /api/sessions?1551825390348 (::1) 2.37ms
[D 15:36:30.354 LabApp] Accepting token-authenticated connection from ::1
[D 15:36:30.355 LabApp] 200 GET /api/terminals?1551825390348 (::1) 0.86ms
[D 15:36:30.677 LabApp] Accepting token-authenticated connection from ::1

..... on and on with the 200 GET token stuff until I kill the server.

keflavich commented 5 years ago

Interesting, I didn't actually expect anyone to be able to reproduce this, since I can't reproduce it on another machine with a nearly identical setup. Your error looks somewhat different, since your server keeps printing error messages, but maybe it's the same underlying issue? Someone pointed out that there might be a problem with tornado 6.x, but I'm on 5.1.1, so that wasn't my issue, at least.

This is the conda configuration I have set up: condalist.txt I have attempted to reinstall most of jupyter's components via pip, and that didn't change anything.

hickst commented 5 years ago

Hmmm....the 200 GET is so regular and persistent that I wonder if it is some "keep-alive heartbeat" thing between front-end and kernel.

Never the less, my Python Console still launches and then freezes.

keflavich commented 5 years ago

I tried spinning up a fresh conda install and had the same issues.

On a different machine, the log from a successful startup looks like:

[D 16:33:41.454 NotebookApp] Starting kernel: ['/users/aginsbur/anaconda/bin/python', '-m', 'ipykernel_launcher', '-f', '/users/aginsbur/.local/share/jupyter/runtime/kernel-75abbae6-913c-4230-a89f-dfa041099a6d.json']
[D 16:33:41.458 NotebookApp] Connecting to: tcp://
[D 16:33:41.459 NotebookApp] Connecting to: tcp://
[I 16:33:41.459 NotebookApp] Kernel started: 75abbae6-913c-4230-a89f-dfa041099a6d

(then a whole lot more debug messages)

keflavich commented 5 years ago

@hickst it looks like your startup was at least more successful than mine; that Info print ([I ...) simply never happens on my failing machine.

hickst commented 5 years ago

@keflavich I just wish I could get consistent behavior. With the new install, I'm suddenly getting an error on startup, which appears to be the Tornado bug you mentioned:

[I 16:40:58.587 LabApp] JupyterLab extension loaded from /usr/local/src/anaconda_5.2/envs/jl/lib/python3.6/site-packages/jupyterlab
[I 16:40:58.587 LabApp] JupyterLab application directory is /usr/local/src/anaconda_5.2/envs/jl/share/jupyter/lab
[W 16:40:58.589 LabApp] JupyterLab server extension not enabled, manually loading...
[I 16:40:58.598 LabApp] JupyterLab extension loaded from /usr/local/src/anaconda_5.2/envs/jl/lib/python3.6/site-packages/jupyterlab
[I 16:40:58.598 LabApp] JupyterLab application directory is /usr/local/src/anaconda_5.2/envs/jl/share/jupyter/lab
[I 16:40:58.598 LabApp] Serving notebooks from local directory: /Users/hickst/temp/testjl
[I 16:40:58.598 LabApp] The Jupyter Notebook is running at:
[I 16:40:58.599 LabApp] http://localhost:8888/?token=651b306c21919d6b0f9a50d08702a3e7b29b915edf032651
[I 16:40:58.599 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:40:58.599 LabApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
[I 16:41:06.834 LabApp] 302 GET /?token=651b306c21919d6b0f9a50d08702a3e7b29b915edf032651 (::1) 0.91ms
[E 16:41:07.741 LabApp] Uncaught exception GET /api/nbconvert?1551829267676 (::1)
    HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/nbconvert?1551829267676', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "/usr/local/src/anaconda_5.2/envs/jl/lib/python3.6/site-packages/tornado/web.py", line 1697, in _execute
        result = method(*self.path_args, **self.path_kwargs)
      File "/usr/local/src/anaconda_5.2/envs/jl/lib/python3.6/site-packages/tornado/web.py", line 3174, in wrapper
        return method(self, *args, **kwargs)
      File "/usr/local/src/anaconda_5.2/envs/jl/lib/python3.6/site-packages/notebook/services/nbconvert/handlers.py", line 13, in get
        from nbconvert.exporters import base
      File "/usr/local/src/anaconda_5.2/envs/jl/lib/python3.6/site-packages/nbconvert/__init__.py", line 7, in <module>
        from . import postprocessors
      File "/usr/local/src/anaconda_5.2/envs/jl/lib/python3.6/site-packages/nbconvert/postprocessors/__init__.py", line 5, in <module>
        from .serve import ServePostProcessor
      File "/usr/local/src/anaconda_5.2/envs/jl/lib/python3.6/site-packages/nbconvert/postprocessors/serve.py", line 19, in <module>
        class ProxyHandler(web.RequestHandler):
      File "/usr/local/src/anaconda_5.2/envs/jl/lib/python3.6/site-packages/nbconvert/postprocessors/serve.py", line 21, in ProxyHandler
    AttributeError: module 'tornado.web' has no attribute 'asynchronous'
[W 16:41:07.744 LabApp] Unhandled error
hickst commented 5 years ago

@keflavich After seeing this tornado error, I started over with a new fresh install. I then whacked the Conda install in the head: I removed tornado 6.0.1 with pip: pip uninstall tornado==6.0.1-py36h1de35cc_0 (I know, you're probably not supposed to do that). Then I installed Tornado 5.1.1 with pip pip install tornado==5.1.1, started Jupyter lab and voila -- a working Console with no error messages during the startup. So, I think the problems we've been seeing here are all a result of Tornado problems in version 6.0.1 (as you initially posited!).

keflavich commented 5 years ago

@hickst Unfortunately, my problems were never with tornado; I still can't get the kernel to start (and I've been on tornado 5.1.1 from the beginning). I do suspect some version of something needs to be reverted, but I can't say what.

keflavich commented 5 years ago

Upgrading to notebook v5.7.5 (released a few hours ago) has resolved the problem.

keflavich commented 5 years ago

After the machine got rebooted, the issue has started recurring even with the most recent notebook version

keflavich commented 5 years ago

aaaand... it went away again. Something very weird is going on with this particular machine. I don't know what it is, but I suspect something about file io still.