jupyter / help

:sparkles: Need some help or have some questions? Please visit our Discourse page.
https://discourse.jupyter.org
291 stars 97 forks source link

Kernel didn't respond (randomly?) #480

Open PetitLepton opened 5 years ago

PetitLepton commented 5 years ago

I am running into a strange bug related to the sequential execution of Jupyter notebooks (Python 3 kernels). The main loop runs sequentially the following execution of a set of notebooks through nbconvert

[...]
from nbconvert.preprocessors import ExecutePreprocessor
[...]
class Report:
    [..]

    def execute_notebook(self, timeout=3600):
        [...]
        notebook = nbformat.read(str(self.notebook_path), as_version=4)
        kernel_name = notebook["metadata"]["kernelspec"]["name"]
        ep = ExecutePreprocessor(timeout=timeout, kernel_name=kernel_name)
        ep.preprocess(notebook, dict(metadata=dict(path=self.notebook_folder)))

The executions are run on a daily basis. Some days, the loop over the notebooks works smoothly but other it fails when the program tries to execute the second notebook with

Traceback (most recent call last):
  File "/home/data/ds-metrics/scripts/recurring_reports.py", line 52, in main
    report.execute_notebook()
  File "/home/data/miniconda3/envs/analytics/lib/python3.6/site-packages/report/__init__.py", line 49, in execute_notebook
    ep.preprocess(notebook, dict(metadata=dict(path=self.notebook_folder)))
  File "/home/data/miniconda3/envs/analytics/lib/python3.6/site-packages/nbconvert/preprocessors/execute.py", line 359, in preprocess
    with self.setup_preprocessor(nb, resources, km=km):
  File "/home/data/miniconda3/envs/analytics/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/data/miniconda3/envs/analytics/lib/python3.6/site-packages/nbconvert/preprocessors/execute.py", line 304, in setup_preprocessor
    self.km, self.kc = self.start_new_kernel(cwd=path)
  File "/home/data/miniconda3/envs/analytics/lib/python3.6/site-packages/nbconvert/preprocessors/execute.py", line 258, in start_new_kernel
    kc.wait_for_ready(timeout=self.startup_timeout)
  File "/home/data/miniconda3/envs/analytics/lib/python3.6/site-packages/jupyter_client/blocking/client.py", line 124, in wait_for_ready
    raise RuntimeError("Kernel didn't respond in %d seconds" % timeout)
RuntimeError: Kernel didn't respond in 60 seconds

After the failures, if I run the loop again or if I run the execution one by one, it works. It “looks” really random and I do not know how to reproduce the problem.

The executions are run on a Debian server from a conda virtual environment with Python 3.6.6 and the following list of relevant packages

ipykernel                 5.1.0           py36h24bf2e0_1001    conda-forge
ipython                   7.1.1           py36h24bf2e0_1000    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
jupyter                   1.0.0                      py_1    conda-forge
jupyter_client            5.2.3                      py_1    conda-forge
jupyter_console           6.0.0                      py_0    conda-forge
jupyter_core              4.4.0                      py_0    conda-forge
nbconvert                 5.4.0                         1    conda-forge
nbformat                  4.4.0                      py_1    conda-forge
notebook                  5.7.2                 py36_1000    conda-forge
pexpect                   4.6.0                 py36_1000    conda-forge
python                    3.6.6                h5001a0f_3    conda-forge
pyzmq                     17.1.2           py36hae99301_1    conda-forge
traitlets                 4.3.2                 py36_1000    conda-forge
zeromq                    4.2.5                hfc679d8_6    conda-forge

Is there a way to log more information during the execution so that, the next time it fails, I could have clues on the behaviour?

Thank you very much for your help!

PetitLepton commented 5 years ago

Hi, it turns out that the problem seems to come from a call to /dev/random on the server I am using.

The following example test.py

from nbconvert.preprocessors import ExecutePreprocessor

ep = ExecutePreprocessor(kernel_name="python3")
km, kc = ep.start_new_kernel()
km.shutdown_kernel()

can get stuck on (using strace python test.py)

open("/dev/random", O_RDONLY)           = 6
poll([{fd=6, events=POLLIN}], 1, -1

while the running script goes on as

open("/dev/random", O_RDONLY)           = 6
poll([{fd=6, events=POLLIN}], 1, -1)    = 1 ([{fd=6, revents=POLLIN}])
close(6)                                = 0
open("/dev/urandom", O_RDONLY)          = 6
PetitLepton commented 5 years ago

This is a similar issue: https://github.com/ipython/ipykernel/issues/342