Preheated kernels cause the server to crash if pool_size is too large #1178

Open steve-marmalade opened 2 years ago

steve-marmalade commented 2 years ago


When launching voila with preheated kernels enabled and a pool_size=2 in a directory with 4 notebooks, the server crashes with an inscrutable error. The voila server runs successfully when pool_size=1.


Edit voila.json as follows

   "VoilaConfiguration": {
      "preheat_kernel": true
   "VoilaKernelManager": {
      "preheat_blacklist": [
      "kernel_pools_config": {
         "default": {
            "pool_size": 2
      "fill_delay": 0

Create a directory dash/ with 4 notebooks.

Run voila as follows:

voila --port=8080 --no-browser --Voila.ip= --show_tracebacks=True dash/

The server will crash after a few seconds.

Troubleshoot Output



    3.9.11 (main, Apr 12 2022, 18:23:35) 
    [GCC 11.2.0]


which -a jupyter:

pip list:
    Package                       Version
    ----------------------------- ---------
    aiohttp                       3.8.1
    aiosignal                     1.2.0
    ansiwrap                      0.8.4
    anyio                         3.4.0
    argon2-cffi                   21.3.0
    argon2-cffi-bindings          21.2.0
    async-timeout                 4.0.2
    attrs                         21.2.0
    Babel                         2.9.1
    backcall                      0.2.0
    black                         22.1.0
    bleach                        4.1.0
    cachetools                    4.2.4
    certifi                       2021.10.8
    cffi                          1.15.0
    charset-normalizer            2.0.9
    click                         8.0.3
    cycler                        0.11.0
    debugpy                       1.5.1
    decorator                     5.1.0
    defusedxml                    0.7.1
    entrypoints                   0.3
    flake8                        4.0.1
    fonttools                     4.28.3
    frozenlist                    1.2.0
    google-api-core               2.3.0
    google-auth                   2.3.3
    google-auth-oauthlib          0.4.6
    google-cloud-bigquery         2.31.0
    google-cloud-bigquery-storage 2.10.1
    google-cloud-core             2.2.1
    google-cloud-storage          1.43.0
    google-crc32c                 1.3.0
    google-resumable-media        2.1.0
    googleapis-common-protos      1.54.0
    grpcio                        1.42.0
    grpcio-status                 1.42.0
    idna                          3.3
    ipykernel                     6.6.0
    ipython                       7.30.1
    ipython-genutils              0.2.0
    ipywidgets                    7.6.5
    jedi                          0.18.1
    Jinja2                        3.0.3
    joblib                        1.1.0
    json5                         0.9.6
    jsonschema                    4.2.1
    jupyter-client                7.1.0
    jupyter-core                  4.9.1
    jupyter-server                1.13.1
    jupyterlab                    3.2.5
    jupyterlab-pygments           0.1.2
    jupyterlab-server             2.9.0
    jupyterlab-widgets            1.0.2
    jupytext                      1.13.4
    kiwisolver                    1.3.2
    libcst                        0.3.23
    markdown-it-py                1.1.0
    MarkupSafe                    2.0.1
    matplotlib                    3.5.1
    matplotlib-inline             0.1.3
    mccabe                        0.6.1
    mdit-py-plugins               0.3.0
    mistune                       0.8.4
    multidict                     5.2.0
    mypy-extensions               0.4.3
    nbclassic                     0.3.4
    nbclient                      0.5.9
    nbconvert                     6.3.0
    nbformat                      5.1.3
    nest-asyncio                  1.5.4
    notebook                      6.4.6
    numpy                         1.21.4
    oauthlib                      3.1.1
    packaging                     21.3
    pandas                        1.3.5
    pandas-gbq                    0.15.0
    pandocfilters                 1.5.0
    papermill                     2.3.3
    parso                         0.8.3
    pathspec                      0.9.0
    pexpect                       4.8.0
    pickleshare                   0.7.5
    Pillow                        8.4.0
    pip                           22.0.3
    platformdirs                  2.4.0
    prometheus-client             0.12.0
    prompt-toolkit                3.0.24
    proto-plus                    1.19.8
    protobuf                      3.19.1
    ptyprocess                    0.7.0
    pyarrow                       5.0.0
    pyasn1                        0.4.8
    pyasn1-modules                0.2.8
    pycodestyle                   2.8.0
    pycparser                     2.21
    pydata-google-auth            1.3.0
    pyflakes                      2.4.0
    Pygments                      2.10.0
    pyparsing                     3.0.6
    pyrsistent                    0.18.0
    python-dateutil               2.8.2
    pytz                          2021.3
    PyYAML                        6.0
    pyzmq                         22.3.0
    requests                      2.26.0
    requests-oauthlib             1.3.0
    rsa                           4.8
    scikit-learn                  1.0.1
    scipy                         1.7.3
    seaborn                       0.11.2
    Send2Trash                    1.8.0
    setuptools                    60.6.0
    setuptools-scm                6.3.2
    six                           1.16.0
    sniffio                       1.2.0
    tenacity                      8.0.1
    terminado                     0.12.1
    testpath                      0.5.0
    textwrap3                     0.9.2
    threadpoolctl                 3.0.0
    toml                          0.10.2
    tomli                         1.2.2
    tornado                       6.1
    tqdm                          4.62.3
    traitlets                     5.1.1
    typing_extensions             4.0.1
    typing-inspect                0.7.1
    urllib3                       1.26.7
    my-voila-project                 0.1.0
    voila                         0.3.0
    wcwidth                       0.2.5
    webencodings                  0.5.1
    websocket-client              1.2.3
    websockets                    10.1
    wheel                         0.37.1
    widgetsnbextension            3.5.2
    yarl                          1.7.2

Command Line Output
[Voila] Using /tmp to store connection files
[Voila] Storing connection files in /tmp/voila_7ryoha1v.
[Voila] Serving static files from /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/voila/static.
[Voila] Voilà is running at:
[Voila] Kernel started: 8c01c683-766f-4f1a-a15f-23944a1bb72f
[Voila] Kernel started: 73cda5ff-1c6b-4dd5-a7a5-c8bcefb40fba
[Voila] Kernel started: a6a404c2-13de-4201-a58b-11214fe06f01
[Voila] Kernel started: 348cdf75-df23-4f4b-8828-17dd8b51a2bc
[Voila] Kernel started: e958e8e4-5938-4ca9-b6c7-41373f660eec
[Voila] Kernel started: b4a0bfd3-6325-44b0-ba94-453e67e0a89e
[Voila] Kernel started: 9ef47d30-370f-4401-9eb4-55ba20f81624
[Voila] Kernel started: e0c3d4b2-beff-4ab4-99ef-e6282b0d46f6
[Voila] Kernel pool of abc_dash.ipynb is filled with 2 kernel(s)
[Voila] Kernel pool of def_dash.ipynb is filled with 2 kernel(s)
[Voila] Kernel pool of ghi_dash.ipynb is filled with 2 kernel(s)
[Voila] Kernel pool of jkl_dash.ipynb is filled with 2 kernel(s)
[Voila] Kernel shutdown: a6a404c2-13de-4201-a58b-11214fe06f01
[Voila] Kernel shutdown: 73cda5ff-1c6b-4dd5-a7a5-c8bcefb40fba
[Voila] Kernel shutdown: b4a0bfd3-6325-44b0-ba94-453e67e0a89e
[Voila] Kernel shutdown: 9ef47d30-370f-4401-9eb4-55ba20f81624
[Voila] Kernel shutdown: e0c3d4b2-beff-4ab4-99ef-e6282b0d46f6
[Voila] Kernel shutdown: 8c01c683-766f-4f1a-a15f-23944a1bb72f
[Voila] Kernel shutdown: 348cdf75-df23-4f4b-8828-17dd8b51a2bc
[Voila] Kernel shutdown: e958e8e4-5938-4ca9-b6c7-41373f660eec
Traceback (most recent call last):
  File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/bin/voila", line 8, in 
  File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/traitlets/config/application.py", line 846, in launch_instance
  File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/voila/app.py", line 548, in start
  File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/voila/app.py", line 596, in listen
  File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 199, in start
  File "/home/user1/.pyenv/versions/3.9.11/lib/python3.9/asyncio/base_events.py", line 601, in run_forever 
  File "/home/user1/.pyenv/versions/3.9.11/lib/python3.9/asyncio/base_events.py", line 1890, in _run_once 
    handle = self._ready.popleft()
IndexError: pop from an empty deque
trungleduc commented 2 years ago

Hi, I can not reproduce it on my machine, can you track the memory usage when you start Voila? Does it relate to the specs of the machine?

steve-marmalade commented 2 years ago

Hi @trungleduc , thank you for the quick response!

I am happy to provide any debugging info that you think would be useful. I just did a rudimentary test by watching htop while I started the voila server, and I did not see any spike in memory usage that could explain the issue. My laptop also has 8 cores, so it doesn't seem like a n_kernels > n_cores issue either. For reference, the CLI crashes within about 5 seconds of starting, so it's not obviously a resource utilization issue.

Can you think of any other info I can provide that would be useful?

vidartf commented 2 years ago

Your error is IndexError: pop from an empty deque, it is likely related to this issue: https://github.com/jupyterlab/jupyterlab/issues/11934 The proper fix is likely somewhere else, but the underlying issue is that nest_asyncio has a race condition if it gets patched in while there are events queued for execution on the asyncio loop.

steve-marmalade commented 2 years ago

Thank you so much for commenting @vidartf !

I was able to apply the patch you proposed here to the voila source code here, and I am no longer seeing this crash. I greatly appreciate the work-around.

Is there a downside to merging this fix into voila ?

trungleduc commented 2 years ago

@steve-marmalade I want to try the fix but can not reproduce the issue. Can you provide a minimal notebook to reproduce it?

vidartf commented 2 years ago

@trungleduc Since this is a race-condition it can be pretty tricky to reproduce it. I haven't been able to see any direct correlation between notebook content and this behavior, but here are suspected things that might make it easier to reproduce as per my struggles in the lab issue: