jupyter / notebook

Jupyter Interactive Notebook
https://jupyter-notebook.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
11.76k stars 4.99k forks source link

Trying to understand latency/slowdown of cell execution in subsequent cells. #3224

Open jmhsi opened 6 years ago

jmhsi commented 6 years ago

I access my jupyter notebook remotely and have consistently noticed that my cells seem to take longer and longer to execute as more cells are run. I'm trying to understand what is going on and if anything can be done to alleviate it.

As an example:

screen shot 2018-01-17 at 2 51 11 am

In the first screenshot, I run the three cells after a kernel restart and it's nice and snappy.

screen shot 2018-01-17 at 2 50 39 am

In the second screenshot, I've loaded 6 small pandas dataframes (60k rows 150ish columns), trained a small random forest regressor (10 estimators) in scikit learn, and then run the same three cells. Not only has execution time increased but the printed times are an additional 3 or 4 additional seconds apart.

As my later cells get more and more complex, this slowdown seems to increase substantially. Any idea what might be going on?

OrangeBot commented 6 years ago

This is quite straightforward hypothesis, but still: maybe remote server becomes overloaded and cannot handle the load? Some garbage collector issues? Did you check resource usage on remote server (you can try to check notebook memory consumption as well). Do you use heavy nbextensions? I've experienced problems with memory in some of such cases. Other potential issue is that connection somehow becomes overloaded, but i have little knowledge of that area.

takluyver commented 6 years ago

Is the kernel busy before you run those cells when they're slow? There should be a little circle in the top right of the interface. It's filled when the kernel is busy, and open when it's idle:

screenshot from 2018-01-17 17-48-14

jmhsi commented 6 years ago

@OrangeBot I'll try and check those out. @takluyver In both cases the kernel is not busy before I run those 3 cells. In the first case its a fresh kernel, and in the second case I wait for all previous cells to finish and then run the 3 datetime cells.

takluyver commented 6 years ago

3234 just made me think of garbage collection. Is it possible that Python's garbage collector is being triggered and having to do a lot of work to clear up objects? You can temporarily turn the garbage collector off by running gc.disable().

jmhsi commented 6 years ago

@takluyver what @SergYevtukhoff mentioned in #3234 sounds just like my problem. The gif also looks similar to problems I experience. I do often have to interrupt and then it will work pretty quickly on the second run of the cell. I'll try gc.disable().

jmhsi commented 6 years ago

Wanted to mention that I'm still running into this problem. Also, #3303 seems to be having the same issue. When I ran gc.disable(), I ended up running into memory errors. I'd like to try and help getting to the bottom of this, and would gladly provide anything that might be useful.

takluyver commented 6 years ago

Yeah, if you run gc.disable(), it won't be able to clean up unused memory, so you eventually need to run garbage collection yourself (gc.collect()) or turn it back on (gc.enable()).

The question is whether, in the meantime, turning it off eliminates the random slowdowns. If it does, that validates the guess that the slowdowns are due to garbage collection. Or if you run into memory errors before you can see if there's a difference, that also suggests there's a problem with something allocating a lot of memory. Then the problem becomes working out what's doing that.

If you still see the same kind of slowdown with garbage collection disabled, then we'll need to look elsewhere. Though I don't really know where at the moment...

jmhsi commented 6 years ago

General info: On Macbook Pro connecting to my Ubuntu 16.04 server (16GB ram, 16GB swap).

So I've been trying various experiments in the regular python shell, IPython shell, and in a jupyter notebook, all while monitoring ram and swap with htop. Ultimately, I noticed that I see the problem that I initially described (long run times on cells and/or commands that previously ran fast) once I max out my RAM and start using swap. Also after RAM has been maxed (even if RAM is freed up later in the notebook), cells/commands can possibly continue to run slowly. What confused me was how my RAM could remain maxed after I would delete objects (del object) after being done with them, to the point that the only variable I see of any remarkable size is my dataframe of interest at 2.7GB based on df.info(memory_usage='deep').

I was using this function (found online from stack overflow, slightly altered) to try and see what could possibly be using up so much memory.

def check_mem():
    # These are the usual ipython objects, including this one you are creating
    ipython_vars = ['In', 'Out', 'exit', 'quit', 'get_ipython', 'ipython_vars']

    # Get a sorted list of the objects and their sizes
    print(sorted([(x, sys.getsizeof(globals().get(x))) for x in globals() if not x.startswith('_') and x not in sys.modules and x not in ipython_vars], key=lambda x: x[1], reverse=True))

Output of this was

[('examine', 2907181481), ('CategoricalDtype', 1056), ('NamespaceMagics', 888), ('month_dict', 648), ('files', 192), ('var_dic_list', 136), ('tqdm_notebook', 136), ('check_mem', 136), ('date_cols', 96), ('data_path', 92), ('pd', 80), ('np', 80), ('plt', 80), ('dc', 80), ('getsizeof', 72), ('filename', 71)]

'examine' is my dataframe of interest. I was thinking that perhaps previous objects I had created were still being referenced by something so I decided to do some checks with large numpy arrays in the IPython shell and python shell.

In the IPython shell I run out of memory even when I'm not assigning the numpy arrays to anything:

screen shot 2018-02-15 at 11 18 53 pm

Whereas in the python shell I can run the same command over and over without my memory and swap increasing:

screen shot 2018-02-15 at 6 09 56 pm

So the easy and immediate "fix" seems to be to get more RAM. Still confused about the IPython memory usage though. I thought I was freeing space with del object and what not. I was also looking at stack overflow posts which recommended things like %reset out but that also didn't seem to free up memory:

screen shot 2018-02-15 at 6 18 29 pm

I also decided to do another test of loading a pandas dataframe multiple times in a jupyter notebook, IPython shell, and python shell. Interestingly, all three experience slowdowns with repeated execution of the blocks of code (jupyter notebook slowing down the most) as RAM is used and swap starts being used:

In the notebook note that code cell 2 was queued at 34:01 and finished executing at 36:04, yet elapsed time was 7-some seconds?

screen shot 2018-02-15 at 11 35 53 pm screen shot 2018-02-15 at 11 36 14 pm

IPython goes from 2-some seconds of elapsed time to over 13 as repeated code execution causes memory to fill and starts using swap:

screen shot 2018-02-15 at 11 37 02 pm screen shot 2018-02-15 at 11 38 03 pm

python behaves similarly to IPython, starting fast at 2-some seconds but ending with over 17 as memory fills and swap starts being used:

screen shot 2018-02-15 at 11 50 06 pm screen shot 2018-02-15 at 11 50 22 pm screen shot 2018-02-15 at 11 51 05 pm
zertrin commented 6 years ago

I am experiencing similar issues as OP since many months already.

First off, disabling GC has no impact, and it seems to be that it's a wrong direction to investigate.

I also load some "big" (200k rows, 30 columns with various datatypes) and notice slowdown of the kernel after the content cell executes.

To be clear, the code inside the cell executes fine and fast (a few ms), but the kernel seems to flicker and keep in busy state afterwards, so that the next cell gets executed a few seconds later (typically around 3 seconds, but it seems to increase with the length of the notebook). It doesn't matter what the content of the cell is. It can be a trivial statement like a = 1, and still the kernel stays busy flickering for some seconds afterwards.

Let me also clarify what I mean by "flickering": after the code inside the cell finished executing, the kernel busy indicator very briefly go to idle (like during 1/20th of a sec), then goes back to busy for some time (more than 100 ms but usually less than one second), then flicker back to idle for extremely brief time, then again goes to busy and flicker idle again, and so on for some time...

See for yourself:

jupyter_busy_flickering

I would like to be able to know what the kernel is doing after executing the content of the cell, anyone knows how to get some insight into that (some kind of debug mode which reports which events trigger the business of the kernel)?

zertrin commented 6 years ago

Ok I think I have a workaround: at the beginning of the notebook, I run the following cell:

%config Completer.use_jedi = False

Then I don't observe the slowdown anymore after the cell finished execution.

jasongrout commented 6 years ago

Very interesting! My guess is that this may be an issue with ipython or ipykernel, then. @takluyver would probably know better what might be going on if it is about using jedi. Possibly @Carreau too.

zertrin commented 6 years ago

Sadly I have to report that the workaround I posted above does not always work. It did solve the issue temporarily but right know the issue is back.

Is there a way to debug why the kernel is busy and what is it doing during the time after the content of the cell finished executing?

takluyver commented 6 years ago

In the IPython shell I run out of memory even when I'm not assigning the numpy arrays to anything:

When IPython displays an object, it keeps a reference to it so that you can do operations with it like Out[5] + 2. If the objects you're displaying are big, this means they will stay in memory. You can discard all this with %reset out, or delete a single object from IPython with %xdel x (where x is the variable name).

You can disable this entirely with the option InteractiveShell.cache_size.

%config Completer.use_jedi = False

It's interesting that that worked even temporarily. I don't know how Jedi could be involved in this.

Is there a way to debug why the kernel is busy and what is it doing during the time after the content of the cell finished executing?

A couple of things we could try:

zertrin commented 6 years ago

Ok I've dug a bit further.

First off:

  • Check get_ipython().events.callbacks to see if anything has registered a post_execute or post_run_cell callback.

gives

{'post_execute': [<function matplotlib.pyplot.install_repl_displayhook.<locals>.post_execute>,
  <function ipykernel.pylab.backend_inline.flush_figures>,
  <function ipykernel.pylab.backend_inline.flush_figures>],
 'post_run_cell': [],
 'pre_execute': [<bound method InteractiveShell._clear_warning_registry of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f4151b7e240>>],
 'pre_run_cell': [],
 'shell_initialized': []}

And

Try to interrupt the kernel while it's doing this, to see if we can get it to produce a traceback.

did not work.

Now in my jupyter_notebook_config.py I've activated the option

c.Session.debug = True

and managed to trigger the issue (partially.)

For this, I run my notebook a bit, load some data, do some analysis (which I can't publish sorry). Then I restart the kernel, and play with the notebook a bit, reloading data, reanalysing dataframes, and so on.

At some point I notice the flickering busy indicator again. So I decide to run a very simple cell with just print('test'), and sure enough, after the cell finished to execute, the flickering happened, but since I didn't use the notebook for a long time before, it's shorter than previously. (I have the distinct feeling that the issue increases in severity the longer the notebook is and the longer you use it without closing it, this guess seems corroborated with the results below).

I have collected the debugging output of just running this very simple cell (print('test')) here: https://haste.zertrin.org/dicigarubu.txt (raw version https://haste.zertrin.org/raw/dicigarubu )

Some things that I noticed:

Apr 18 13:16:36 server1 jupyter[27570]: {'buffers': [],
Apr 18 13:16:36 server1 jupyter[27570]:  'content': {'allow_stdin': False,
Apr 18 13:16:36 server1 jupyter[27570]:              'code': 'print(var_dic_list())',
Apr 18 13:16:36 server1 jupyter[27570]:              'silent': False,
Apr 18 13:16:36 server1 jupyter[27570]:              'store_history': False,
Apr 18 13:16:36 server1 jupyter[27570]:              'user_expressions': {}},
Apr 18 13:16:36 server1 jupyter[27570]:  'header': {'date': datetime.datetime(2018, 4, 18, 5, 16, 36, 678382, tzinfo=datetime.timezone.utc),
Apr 18 13:16:36 server1 jupyter[27570]:             'msg_id': '9201D2F3F4884502B3165454EC8D0C4D',
Apr 18 13:16:36 server1 jupyter[27570]:             'msg_type': 'execute_request',
Apr 18 13:16:36 server1 jupyter[27570]:             'session': '6B0C59C49B6C42DD83F7F810A8278327',
Apr 18 13:16:37 server1 jupyter[27570]:             'username': 'username',
Apr 18 13:16:37 server1 jupyter[27570]:             'version': '5.2'},
Apr 18 13:16:37 server1 jupyter[27570]:  'metadata': {},
Apr 18 13:16:37 server1 jupyter[27570]:  'parent_header': {}}

I think it would be nice to know what triggered this hidden print(var_dic_list()) execution after each cell.

zertrin commented 6 years ago

Ok, a bit of googling let me identify the origin of this print(var_dic_list()) hidden execution: this was from the Variable Inspector from jupyter_contrib_nbextensions (https://github.com/ipython-contrib/jupyter_contrib_nbextensions/blob/master/src/jupyter_contrib_nbextensions/nbextensions/varInspector/var_list.py).

I have now disabled this extension and I couldn't reproduce the issue on a quick test.

However, since the issue usually happened on long sessions, I will continue to test more and report in a day or two if the issue arises again with the Variable Inspector disabled.

zertrin commented 6 years ago

Oh in fact the issue is already known: https://github.com/ipython-contrib/jupyter_contrib_nbextensions/issues/1275

Not sure if OP @jmhsi also uses this extension.

jmhsi commented 6 years ago

I do not use the Variable Inspector extension

DavisTownsend commented 6 years ago

@zertrin I was having this same issue and this is correct. There is one caveat though. If you have already enabled variable inspector through the nbextensions configurator UI for your notebook, you need to make sure you shutdown your notebook completely (not just restart it from within the notebook), because the extension will still be in your notebook's metadata. Once I shutdown the notebook and then started it again, the variable inspector no longer showed in my new notebook, and I didn't have this problem anymore

iNLyze commented 6 years ago

You solved my problem! Thanks. I would never have tracked that one down on my own.

jtlz2 commented 6 years ago

Thank you so much - I disabled Variable Inspector and everything is back to normal now.... This has been doing my head in for two days - thanks again.

JanVanIm commented 5 years ago

I have the same issue. I am new to Python and to Jupyter. How can one disable the extension Variable Inspector ?

iNLyze commented 5 years ago

Go to Edit/nbextensions config and disable (uncheck) it.

JanVanIm commented 5 years ago

I find the Edit tab in the Jupyter Notebook but I don't find nbextensions config

zertrin commented 5 years ago

If you did not install the notebook extensions and activate the variable inspector yourself, then it's likely that your problem is unrelated to this one.

JanVanIm commented 5 years ago

Do you know where the issue comes from ? I have a dataset of images of 230 MB and when I print some images it takes several minutes to be displayed.

zertrin commented 5 years ago

There are literally thousands of reasons that could explain any given slowdown. With the amount of information you provided, there is no way to guess what it could be.

It seems you are not in the situation described by this issue (execution time increasing over subsequent cells even for trivial code caused by the variable inspector extension). You do not even have extensions installed.

I suggest for you to seek help here: https://gitter.im/jupyter/jupyter (live chat, but be patient) or here https://groups.google.com/forum/#!forum/jupyter (general discussion mailing lists) instead.

maxwellmckinnon commented 5 years ago

+1 for turning off variable inspector with Pandas.

I was having huge slowdowns on fairly small datasets (was working with data sampled down to 300MB for speed), and even doing something like df.head() was taking 10 seconds.

If you have the execution time extension, its timings are on the order of 30ms, as I would expect from a small dataset. I knew something was goofy. Thanks for the help.

chicofranchico commented 5 years ago

+1 for turning off variable inspector with IRkernel.

Reading a file from the FS and doing a simple row count assignment on the first cell, execution time is in the order of ms. Adding a new cell and trying to access that row count on the next cell and the execution time becomes minutes!

if you restart the kernel and rerun the whole notebook, everything works fine. As soon as you add a new cell, things become sluggish again.

mirekphd commented 4 years ago

this was from the Variable Inspector from jupyter_contrib_nbextensions

I have now disabled this extension

It's a strangely persistent bug. The solution was still needed 2 years after the original reporting here...

BTW such questions should be asked on Stack Overflow. It would be an accepted solution with upwards of 600+ in reputation value for @zertrin.