jupyter / notebook

Jupyter Interactive Notebook
https://jupyter-notebook.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
11.58k stars 4.85k forks source link

Editor sessions keep losing IPython kernel after a while #4413

Open wstomv opened 5 years ago

wstomv commented 5 years ago

I run Jupyter Notebook 5.7.4 and IPython kernel 7.2.0 on MacOS Mojave. When I keep an editor session open for a couple of hours, the kernel no longer works. Restarting and reconnecting don't resolve this. Only reopening the notebook helps (which is annoying). Previously, when working with earlier versions of Jupyter, IPython, and MacOS, I never had this issue.

I have tried reinstalling some of the components involved (notebook, jupyter, ipython) as suggested in #1892, but to no avail.

Here is some output on the console reporting an exception that might be helpful in tracking down the issue:

[W 22:34:12.029 NotebookApp] Replacing stale connection: 5284dc32-6566-4142-8da3-d21f5f6fb2f7:e03790929078421d90016915384aa2ed
[E 22:34:19.579 NotebookApp] Exception restarting kernel
    Traceback (most recent call last):
      File "/anaconda3/lib/python3.7/site-packages/notebook/services/kernels/handlers.py", line 85, in post
        yield gen.maybe_future(km.restart_kernel(kernel_id))
      File "/anaconda3/lib/python3.7/site-packages/notebook/services/kernels/kernelmanager.py", line 285, in restart_kernel
        self._check_kernel_id(kernel_id)
      File "/anaconda3/lib/python3.7/site-packages/notebook/services/kernels/kernelmanager.py", line 364, in _check_kernel_id
        raise web.HTTPError(404, u'Kernel does not exist: %s' % kernel_id)
    tornado.web.HTTPError: HTTP 404: Not Found (Kernel does not exist: 5284dc32-6566-4142-8da3-d21f5f6fb2f7)
[E 22:34:19.581 NotebookApp] {
      "Host": "localhost:8888",
      "Connection": "keep-alive",
      "Content-Length": "0",
      "Accept": "application/json, text/javascript, */*; q=0.01",
      "Origin": "http://localhost:8888",
      "X-Requested-With": "XMLHttpRequest",
      "X-Xsrftoken": "2|e3ae5aee|c7eaf473c7301348d3b5cb6d392436a5|1549565849",
      "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36",
      "Dnt": "1",
      "Referer": "http://localhost:8888/notebooks/Documents/Education/2WH20%20Programming%20and%20Modelling/course-material-2wh20/year-2018-2019/lectures/ThinkPythonLists.ipynb",
      "Accept-Encoding": "gzip, deflate, br",
      "Accept-Language": "en-US,en;q=0.9,nl;q=0.8",
      "Cookie": "_xsrf=2|e3ae5aee|c7eaf473c7301348d3b5cb6d392436a5|1549565849; username-localhost-8888=\"2|1:0|10:1550580580|23:username-localhost-8888|44:OWFjNWJiMDBiODcyNDE1ZmJlNDk1OWMyNmUzYzdmNjc=|3db330293c2485053100d52fa6f55713723441cdbe5dc44bdf687de08d7c9ba9\""
    }
[E 22:34:19.581 NotebookApp] 500 POST /api/kernels/5284dc32-6566-4142-8da3-d21f5f6fb2f7/restart (::1) 3.01ms referer=http://localhost:8888/notebooks/Documents/Education/2WH20%20Programming%20and%20Modelling/course-material-2wh20/year-2018-2019/lectures/ThinkPythonLists.ipynb
[W 22:34:19.612 NotebookApp] 404 DELETE /api/sessions/cc89e26a-4afe-4011-bd27-5d56f021905c (::1): Session not found: session_id='cc89e26a-4afe-4011-bd27-5d56f021905c'
[W 22:34:19.612 NotebookApp] Session not found: session_id='cc89e26a-4afe-4011-bd27-5d56f021905c'
[W 22:34:19.612 NotebookApp] 404 DELETE /api/sessions/cc89e26a-4afe-4011-bd27-5d56f021905c (::1) 1.15ms referer=http://localhost:8888/notebooks/Documents/Education/2WH20%20Programming%20and%20Modelling/course-material-2wh20/year-2018-2019/lectures/ThinkPythonLists.ipynb
[I 22:34:19.632 NotebookApp] Kernel started: e29f69e2-3ab6-404c-ab70-2e832f489a5f
kevin-bates commented 5 years ago

Although off by default, you might want to make sure you don't have culling enabled since your description fits that scenario. Culling gets enabled by a non-zero timeout value:

--MappingKernelManager.cull_idle_timeout=<Int>
    Default: 0
    Timeout (in seconds) after which a kernel is considered idle and ready to be
    culled. Values of 0 or lower disable culling. Very short timeouts may result
    in kernels being culled for users with poor network connections.

either on the command line or corresponding config file.

If culling is in play, a warning message will be logged at the time the idle kernel is culled - similar to the following (which was configured with a 600 second timeout and 30 second interval):

[W 22:34:19.612 NotebookApp] Culling 'idle' kernel 'python_delayed' (558677e7-fed2-42fe-b300-0dacd4b3c607) with 2 connections due to 624 seconds of inactivity.
wstomv commented 5 years ago

There is no mention of Culling in any of the output for NotebookApp that I have accumulated over several days.

I find the Exception that is raised when restarting the kernel suspicious (see traceback above). But I cannot interpret that message. It involves handlers.py and kernelmanager.py.

Also: I cannot close such a notebook for which the kernel disappeared. It just ignores the close. (And that probably relates to the Session not found messages that I see in the log.)

kevin-bates commented 5 years ago

Thanks for checking. Yeah, the notebook server believes the session is now stale and closes it.

If prior log statements aren't "interesting", it might be helpful to enable debug (--debug) and reproduce it. Based on the code comment, it sounds like there was a lost network connection on the client side, but I'm not familiar with the session-to-kernel relationship.

The kernel restart failure is purely due to the kernelmanager no longer having a record of the kernel in its active list - which led me down the culling path. This could also happen if the kernel was shutdown, or you're now targeting a different notebook server (unlikely - especially since it had a record of your session).

Sorry I can't be more helpful at this point. Perhaps someone else has ideas?

wstomv commented 5 years ago

This issue does not arise when using JupyterLab (0.35.4; server: 0.2.0): an overnight edit session survived.

kevin-bates commented 5 years ago

That's interesting since they both share the same "server" code. And you found this to be 100% reproducible via Notebook 5.7.4?

wstomv commented 5 years ago

Problem keeps recurring with Jupyter Notebook, but not with JupyterLab.

However, I did now notice that after a while my notebooks under Jupyter Notebook lose trust (showing not trusted). When clicking this to regain trust, the page reloads, and the kernel seems to behave properly again.

That is probably expected behavior of the kernel when a notebook is not/no longer trusted. Question, of course, is then: why does this happen?

wstomv commented 5 years ago

Another thing that seems to help is reloading the browser page with the open notebook.