jupyter / notebook

Jupyter Interactive Notebook
https://jupyter-notebook.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
11.57k stars 4.85k forks source link

jupyter notebook slows down after multiple executions and needs a kernel restart #3390

Open cod3licious opened 6 years ago

cod3licious commented 6 years ago

Unfortunately, this error is not so easy to clearly reproduce and I'm not sure if it's a jupyter issue or an iPython kernel issue or maybe even due to the other libraries I'm using... But since it can be "fixed" by a kernel restart, I'm assuming it should be due to jupyter or the iPython kernel.

I'm training neural networks using the keras library on a Nvidia GPU on a local desktop computer running Linux Peppermint 8 (a derivative of ubuntu 17.4). I have jupyter notebook version 5.0.0 for python 2.7 installed via anaconda.

My problem is that at the beginning of executing the notebook, everything runs really fast and the neural networks train with less than 100ms per training epoch. After executing some cells multiple times, e.g. to test different parameters, after some time the training starts to slow down, taking at least twice as long and it gets worse the more often I run the cells. Then if I restart the kernel, when I execute all the cells again, everything is fast again.

I'm not really sure how this can happen. I assume it has something to do with the garbage collection (or whatever is being performed at a restart), but as I was saying I'm mostly reexecuting the same cells, not adding more, so it should overwrite the existing variables and not add stuff to the stack.

takluyver commented 6 years ago

Maybe it's the GPU's memory getting full, rather than Python variables (which would all be in normal memory), and restarting the process causes it to be cleared up? I don't know much about how GPU memory is managed.

gabefair commented 6 years ago

I'm having the same issue for the past month on windows 10 pro x64 on a 16gb laptop with a dedicated GPU (GTX 960M). Using the latest version of Jupyter notebooks. I'm not having the issue with JupyterLabs. I would provide more info but this issue isn't consistent and I haven't pinned down a controlled test yet. I mention the dedicated laptop b/c I do notice the problem is worse when I have multiple monitors plugged in. So maybe its the GPU vRAM getting full?

cod3licious commented 6 years ago

Yeah, it's probably the memory. If I train a neural network with a lot of different settings, then first it gets slower and at some point the program even crashes with a ResourceExhaustedError. This is really annoying especially when you let something run over night, and I think it really should not happen, since I'm not using any extra memory but overwriting variables.

I don't know how garbage collection for GPU memory works or whatever, but apparently something is being done when restarting the kernel that fixes this issue and maybe it would help if what is being done there would also be executed more often while the notebook is still running normally?

@gabefair I haven't tested that yet but just found here that keras.backend.clear_session() might help in clearing the GPU memory.

takluyver commented 6 years ago

I think you'll have to talk to Tensorflow about that. We (Jupyter) don't do anything specific about the GPU when you restart the kernel.

anch2150 commented 6 years ago

I had similar issues. I tried keras.backend.get_session().close() but it's not helpful.

goldflower commented 5 years ago

Similar issue here. The speed of training and saving models becomes extremely slow after several experiments. Instead I'm using tensorflow cpu version as backend. So this issue may not come from GPU usage. After I restart jupyter (restart notebook didn't work), everything works normal again and same issue still happened. In my opinion, this issue caused by jupyter but I don't know how to solve it.

Environment: Windows10 python3.6 keras 2.2.2 tensorflow 1.10 jupyter 4.4

ashishgupta1350 commented 5 years ago

At least the mouse works. My PC won't even let me use the mouse once it hangs.

Zhen-Lu-Thras commented 4 years ago

Yeah, it's probably the memory. If I train a neural network with a lot of different settings, then first it gets slower and at some point the program even crashes with a ResourceExhaustedError. This is really annoying especially when you let something run over night, and I think it really should not happen, since I'm not using any extra memory but overwriting variables.

I don't know how garbage collection for GPU memory works or whatever, but apparently something is being done when restarting the kernel that fixes this issue and maybe it would help if what is being done there would also be executed more often while the notebook is still running normally?

@gabefair I haven't tested that yet but just found here that keras.backend.clear_session() might help in clearing the GPU memory.

The function keras.backend.clear_session() helps!

lizard24 commented 3 years ago

Yeah, it's probably the memory. If I train a neural network with a lot of different settings, then first it gets slower and at some point the program even crashes with a ResourceExhaustedError. This is really annoying especially when you let something run over night, and I think it really should not happen, since I'm not using any extra memory but overwriting variables. I don't know how garbage collection for GPU memory works or whatever, but apparently something is being done when restarting the kernel that fixes this issue and maybe it would help if what is being done there would also be executed more often while the notebook is still running normally? @gabefair I haven't tested that yet but just found here that keras.backend.clear_session() might help in clearing the GPU memory.

The function keras.backend.clear_session() helps!

That worked for me! I was calculating SSIM indices in tf for a numpy array in a for loop and I could see in the task manager that memory was adding up so much that it slowed down within a few loops. Deleting arrays or using garbage collector did not help, but clearing the session solved the problem. Thanks!