ipython / ipython

Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.
https://ipython.readthedocs.org
BSD 3-Clause "New" or "Revised" License
16.3k stars 4.45k forks source link

Kernel/Interrupt Kernel does not terminate stuck subprocesses in the notebook #3400

Closed pfmoore closed 4 years ago

pfmoore commented 11 years ago

When a subprocess is run from the notebook, if it gets stuck the kernel will get locked waiting for it. Selecting Kernel/Interrupt from the menu does not terminate the subprocess, but rather leaves the kernel in an unstable, "partially locked" state, where other cells do not execute. The only resolution is to restart the kernel.

This occurred for me on Windows - I do not know if it also happens on Unix.

To demonstrate, start a notebook and enter !python in a cell. The process will lock as it is waiting for interactive input. As there is no way to provide that input, the kernel must be restarted to continue.

minrk commented 11 years ago

duplicate of #514

pfmoore commented 11 years ago

Thanks, I hadn't spotted the duplicate. Having said that, t#514 is discussing a much more complex scenario, involving actually interacting with subprocesses (and it seems to be Unix based, as it's about pty-style interaction). For my requirements, a simple means of killing a rogue subprocess would do. Consider something as simple as !sleep 50000, where just being able to kill the sleep is all you want. (Maybe Ctrl-C works for this on Unix, but it doesn't on Windows).

minrk commented 11 years ago

Sorry, I see what you mean now. Reopening as a separate issue - interrupt not interrupting subprocesses on Windows.

arijun commented 10 years ago

I'm not sure this is limited to subprocesses. Try executing input() or raw_input() and then clicking the interrupt button--the kernel hangs and has to be restarted.

minrk commented 10 years ago

@arijun on What OS? interrupting input and raw_input raise KeyboardInterrupt here (OS X).

arijun commented 10 years ago

Sorry, windows. That's why I thought it was likely the same issue @pfmoore had, since that also happened on windows.

minrk commented 10 years ago

Ah, crap. I know what that bug is. I think it's a libzmq (or pyzmq) bug that prevents it from handling interrupts properly while polling on zmq sockets. It's nothing in IPython. sigh

wmayner commented 8 years ago

I think I just got bitten by this and I'll need to restart the kernel, meaning I've just lost a lot of data…

I was using pdb to debug a function. I re-ran the cell without first quitting pdb, and now I can't interrupt anything.

Here's a minimal example that reproduces this:

def test():
    import pdb; pdb.set_trace()  # XXX BREAKPOINT
    return 0

test()

Run this cell twice in a row.

lancekrogers commented 8 years ago

This same issue happens for me in Unix as well word for word.

"When a subprocess is run from the notebook, if it gets stuck the kernel will get locked waiting for it. Selecting Kernel/Interrupt from the menu does not terminate the subprocess, but rather leaves the kernel in an unstable, "partially locked" state, where other cells do not execute. The only resolution is to restart the kernel."

nealmcb commented 7 years ago

Thanks for the nice example of a pdb hang, wmayner. But sInce pdb doesn't run in a subprocess, I opened a separate issue for pdb: #10516

JulesGM commented 6 years ago

Printing too much data, let's say accidentally printing a gigantic numpy array, can make the kernel completely unresponsive and impossible to to terminate

rajulah commented 6 years ago

Has a solution been found for this issue yet? i just ran a machine learning model that took 14hr to complete and now my kernel is stuck and doesnt execute cells. if i restart, i have to run the model again for 14hrs. So is there any solution?

JulesGM commented 6 years ago

haven't tried it, but this seems like it could help: http://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/limit_output/readme.html

takluyver commented 6 years ago

If a specific subprocess has got stuck, you can probably find it in the task manager and forcibly kill it that way. Hopefully that lets the kernel continue.

JulesGM commented 6 years ago

no, the issue is that the kernel spams the webserver to death or something. killing the webserver kills the kernel afaik

patricktokeeffe commented 6 years ago

I'm dealing with a stuck notebook too: interrupt, restart, reconnect - none of them do anything. The [*] indicators remain next to cells as if they are queued to run but no cells get executed.

The behavior began after running a cell containing:

filedir = "20161214_rooftest"

!ls -RC $filedir

Which is strange because I have analogous cells elsewhere that run successfully. I'm not sure how/if ls could get stuck but otherwise my situation seems to match this issue.

ashishanand7 commented 6 years ago

Is there any solution to this . Kernal cannot be interrupted . For me it's happening with GridSearchCV in sklearn .

ahmedrao commented 6 years ago

There was a process named conda.exe in Task manager. I killed that process and I was successfully able to interrupt the kernel

IMBurbank commented 5 years ago

Interrupt is still broken. I have to restart and reload my imports every time.

metya commented 5 years ago

same problem in jupyter lab on python 3.7 kernel

CathyQian commented 5 years ago

same problem in Jupyter Notebook and I can't find the process named conda.exe in Task manager. Any updates on the solution yet?

esha-sg commented 5 years ago

Not a solution Sometimes trying to reconnect to the kernel helps in this case

ambareeshsrja16 commented 5 years ago

Observing the same, in Windows 10

arianccbasile commented 5 years ago

Did anyone succeed on that? I am getting crazy

completelyboofyblitzed commented 5 years ago

There was a process named conda.exe in Task manager. I killed that process and I was successfully able to interrupt the kernel

@ahmedrao How????

rudaoshi commented 5 years ago

This problem has existed for six years and still no solution.

HamdiTarek commented 5 years ago

This problem has existed for six years and still no solution.

six years without any solution, just restart the kernel

Stuj79 commented 5 years ago

Having the same problem increasingly frequently, almost to the point where the notebooks are becoming unusable which is a real shame. On Anaconda 3.7 and the cells just hang with the asterisk, and I am unable to interrupt the kernel.

vinklibrary commented 5 years ago

Mark Same Issue

ChrisPalmerNZ commented 5 years ago

Have always had this problem especially with dbg and input. Windows 10; Notebook server 5.7.8; Python 3.6.6.; Conda 4.7.5 Have learned that I basically cannot reliably debug Notebooks :(

SangamSwadiK commented 4 years ago

yep, the problem still exists. Is there any way to over come this ?? I dont want to run my notebook all over again , because it takes too long to get to where I'm !!

louismartin commented 4 years ago

Up! This problem has been a pain for me for years now every time I use pdb and forget to quit before I re-run the cell.

louismartin commented 4 years ago

I created a bounty on BountySource. Maybe this will finally be fixed if we can gather enough money. https://www.bountysource.com/issues/44958889-hang-after-running-pdb-in-a-cell-kernel-interrupt-doesn-t-help

itamarst commented 4 years ago

For the process issue specifically, on Windows specifically, here's a theory (still untested):

  1. Process is run via IPython.utils._process_win32.system, which calls _system_body, which calls p.wait() on the subprocess.Popen object.
  2. Windows subprocess.Popen.wait() has a known issue where it is not interruptible: https://bugs.python.org/issue28168

If that's the cause, switching to busy looping every 100ms or so would probably make it interruptible, or if not then taking the approach in the patch.

nealmcb commented 4 years ago

Thank you @Carreau!

ChrisPalmerNZ commented 4 years ago

Thanks @Carreau! When will this find its way into a general release, and does it mean that we will then be able to use the Interrupt Kernel button sucessfully?

Carreau commented 4 years ago

I'll likely do a 7.13 tomorrow. It might fix the interrupt button.

Arpit-Gole commented 4 years ago

Hey @Carreau I am facing this issue when I am trying to interrupt an ongoing cell execution, interrupt goes on forever and at last I have to restart.

So in order to demonstrate, as @wmayner suggested a way to replicate the issue. I have attached a few screenshots for the same. pyt1

Jupyter versions in my machine. pyt2

itamarst commented 4 years ago

@Arpit-Gole pdb is its own specific issue; I'm hoping to get that fixed soon too: https://github.com/ipython/ipython/issues/10516

Arpit-Gole commented 4 years ago

@itamarst I am training a model as follows :

forest_clf = RandomForestClassifier() cross_val_score(forest_clf, X_train, y_train, cv=3, scoring='accuracy', verbose=10, n_jobs=-1)

Now I know it is bound to take time-based on my dataset. But say for whatever reason I choose to stop the processing in half-way by pressing Kernel>Interrupt Kernel. Ideally, it should interrupt but it takes forever to stop. Now I don't want to restart because all my progress will be gone.

Please Help!

Carreau commented 4 years ago

If what you are trying to interrupt is implemented in C then there is nothing to do. It's up to the library you use to handle sigint.

jvschoen commented 4 years ago

I run into this sometimes too... Here is a reproduceable example from jupyer lab:

LOAD DATA

import requests
import pandas as pd

url='https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv'
r = requests.get(url, allow_redirects=True)
        with open('data/nyc_taxi.csv', 'wb') as f:
            f.write(r.content)
df_taxi = (
        pd.read_csv('data/nyc_taxi.csv')
        .assign(timestamp=lambda x: pd.to_datetime(x.timestamp))
)

df_train = df_taxi.iloc[:5000]
temp_train = df_train.set_index('timestamp')

Run Grid Search: THIS CANNOT BE INTERRUPTED

import itertools
#set parameter range
p = range(0,3)
q = range(1,3)
d = range(1,2)
s = [24,48]

# list of all parameter combos
pdq = list(itertools.product(p, d, q))
seasonal_pdq = list(itertools.product(p, d, q, s))
# SARIMA model pipeline
for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(temp_train[:240],
                                            order=param,
                                            seasonal_order=param_seasonal)

            results = mod.fit(max_iter = 50, method = 'powell')

            print('SARIMA{},{} - AIC:{}'.format(param, param_seasonal, results.aic))
        except as e:
            print(e)
            continue

Is there any advice?

dawnset commented 4 years ago

run into this problem three times this afternoon, reminds me of the good old days when i was still using urllib. thought its on urllib, cause there is no response to my request. I was working but coding, I have to find a solution but a answer. So I store every variable to local file. really don't want to see that happen again and again.

Crispy13 commented 4 years ago

I am facing the same issue when using tensorflow and gpu for training deep learning model.

matija2209 commented 3 years ago

Run into this with time.sleep and requests

mbrad092 commented 3 years ago

Also having this issue with time.sleep requests on Windows, but runs fine on Mac OS X

not-Ian commented 3 years ago

Having this issue with ThreadPoolExecutor... Something like this:

numberOfImageGatherers = 2

with concurrent.futures.ThreadPoolExecutor(max_workers=numberOfImageGatherers + 1) as executor:
        futures = []

        for imageGatherer in range(numberOfImageGatherers):
            imageDataGatherer = ImageDataGatherer(batch_size)
            futures.append(executor.submit(imageDataGatherer.gatherImageData, pipeline))

        modelTrainingConsumer = ModelTrainingConsumer(vae, plot_losses)    

        futures.append(executor.submit(modelTrainingConsumer.trainModel, pipeline))

        concurrent.futures.wait(futures)

Only way to interrupt is to restart kernel... very frustrating

TV4Fun commented 3 years ago

This is still happening. I would suggest re-opening this issue. Seeing it in a NumPy-heavy tight neural network training loop on Windows 10.

jupyter core     : 4.7.1
jupyter-notebook : 6.2.0
qtconsole        : 4.7.7
ipython          : 7.20.0
ipykernel        : 5.3.4
jupyter client   : 6.1.7
jupyter lab      : 2.2.6
nbconvert        : 6.0.7
ipywidgets       : 7.6.3
nbformat         : 5.1.2
traitlets        : 5.0.5

Is there anything I need to upgrade?

Esesna commented 3 years ago

Возникает такая же проблема, и самое интересное в случайных участках кода при использование B0 remote api для coppelia sim. При это если я использую Publisher и Subscriber

hamitaksln commented 2 years ago

I found this solution to stop cell when I work with time sleep or requests: keyboard interrupt

for i in range(20):
    try:
        print(i)
        time.sleep(1)
    except KeyboardInterrupt:
        print("Stopping...")
        break