Open ogrisel opened 3 months ago
I think I also triggered the same problem after chaining a manual restart (e.g. my using the 0-0
keyboard short cut) followed by a cell execution with the same import statements (import sklearn
or import six
or any other builtin package of pyodide), for instance via the shift-enter
keyboard shortcut.
It seems that this is caused by a race condition: if you wait long enough after a manual restart and the first cell execution, then there is no problem (I think).
Thanks @ogrisel for opening the issue :+1:
It seems that this is caused by a race condition: if you wait long enough after a manual restart and the first cell execution, then there is no problem (I think).
Right it looks like a race condition that would need to be fixed at some point.
I'm seeing a similar issue, and not finding it random -- it's reproducible very consistently across browsers and deployments, including, as @ogrisel noted, your demo notebook.
What I'm noticing is not a kernel crash though. It seems like the kernel successfully runs, but without correctly talking to the UI -- the notebook runs successfully but (mostly) silently. So, for instance, if you have a block of code like:
import time
i = 1
while True:
with open(f"{i}.txt", "w") as stream:
stream.write('hi')
time.sleep(1)
i += 1
then 'restart and run all' will cause the kernel to write files forever without opportunity for interaction from the user.
Interestingly, if the user provides input while the kernel is initializing, jupyterlite will sometimes start a new kernel, so the user will have a functioning notebook environment running on a different kernel, but the kernel started by the "restart and run all" operation will continue silently writing files.
A similar behavior, though not as severe, and not as consistent, happens with "restart kernel" if executed while the kernel is performing a blocking operation -- the kernel will sometimes crash and restart again if it receives input from the user while it's initializing.
Similar but not identical behaviors occur with the xeus kernel -- it never runs silently after "restart and run all", but rather consistently crashes on its first restart attempt, then successfully restarts a second time -- unless it receives input from the user while initializing, in which case it will crash and attempt another restart, and so on, and so on.
Is anything more known about a cause, and are there any known workarounds? Also, I have no expertise with your codebase, but I would be happy to help investigate or test if there's anything in particular you'd like to point me towards.
It appears that I can sometimes trigger this problem even with a regular "Restart" when trying to interrupt a long-running execution.
Then I am in a bad state when nothing works anymore: creating a new empty notebook, inserting a cell with a print statement at the beginning of the notebook, executing directly or restarting and executing that cell never completes and never outputs anything.
The past (restarted) kernels still show up with "No session connected" in the side panel. I can shutdown them all, but that does not fix the problem.
I tried to delete all the cache / local storage / cookies for that page in the firefox dev tools and reload the page but executing the cell with the print statement is still stuck.
The only way to recover is to close the browser tab and reopen a new one.
I confirm I can semi-randomly reproduce the behaviors described in https://github.com/jupyterlite/jupyterlite/issues/1464#issuecomment-2378194811 using this notebook uploaded to https://jupyter.org/try-jupyter/lab/:
https://gist.github.com/ogrisel/50f2a29b14b9ebea503bab8a42ddbb9a
Apparently, it's important to use a heavy enough import statement (such as pandas) in the second cell to reproduce the problem by clicking "restart and run all" a few times.
The generated log file that writes in the local storage of the browser shows that the kernels often continue executing even when the kernel becomes detached from any session.
I reproduce similar problems both with Chrome and Firefox.
Note that I could even randomly crash the full Chrome tab using a more complex variant of this notebook that would further import sklearn
and read a few MB CSV file using pandas from the local storage, but I chose to just link to the simpler variant of the notebook to reproduce the first race condition.
EDIT: I found more minimal reproducers for Firefox and Chrome in:
I tried to see if I could reproduce the "No sessions connected" state by clicking the "restart and run all cells" or "restart" buttons in a regular jupyter lab setup and I can never trigger this.
So this is really a problem with pyodide kernels started in web workers by JupyterLite. JupyterLite needs to make sure that those workers are properly shutdown; otherwise this rapidly triggers the memory usage problem and crashes described above in practice.
I also tried with the xeus-python kernel using https://jupyterlite.github.io/xeus-python-demo/lab/index.html, and I can reproduce the same problem: I can also leak many "No sessions connected" kernels by repetitively clicking the "restart and run all cells" button and make chrome crash as a result.
So this is not a pyodide specific problem, but rather a generic bug in JupyterLite itself.
Description
Pressing "Restart and run all" randomly causes silent pyodide kernel crashes as seen on the following screen recording:
jupyterlite_pyodide_crash.webm
Reproduce
six
Expected behavior
Context
Browser Output
EDIT: making the restart button work is important, especially since it's not possible to interrupt a long-running cell in JupyterLite for now (see #459).