microsoft / vscode-jupyter

VS Code Jupyter extension
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter
MIT License
1.28k stars 287 forks source link

Interactive UI slows to a halt over time #1304

Closed kdkavanagh closed 3 years ago

kdkavanagh commented 5 years ago

Moving out of microsoft/vscode-python#6563 since it seems to be a bit unrelated.

Does the interactive window do any other regular updating of old cells and/or the current set of variables? Despite disabling the variable inspector entirely, I'm still seeing this same issue with a large number of large dataframes.

EDIT: Also see jupyter server spinning at 100%, during execution of simple commands (e.g dataframe.head()) that complete nearly instantly in a raw python terminal and also JupyterLab UI

Just tried connecting to a kernel that was loaded up with big variables (long/wide dataframes) outside of vscode using microsoft/vscode-python#7015 (i.e I have no working cells in the vscode interactive window) - Things are still super slow, like 10min+ for df.head(), and again still super fast in JupyterLab.

mgsnuno commented 5 years ago

Having exactly the same issues as @kdkavanagh

rchiodo commented 5 years ago

Just tried connecting to a kernel that was loaded up with big variables (long/wide dataframes) outside of vscode using microsoft/vscode-python#7015 (i.e I have no working cells in the vscode interactive window) - Things are still super slow, like 10min+ for df.head(), and again still super fast in JupyterLab.

@kdkavanagh, Just to be clear, your session is super slow without using the interactive window, but just connecting to an existing kernel?

For a repro, are you just adding a bunch of really big data frames?

kdkavanagh commented 5 years ago

I can connect to the kernel quickly, but then interacting with the interactive window becomes super slow, e.g when I run df.head() from the interactive window

I've been reattaching to a kernel loaded up with my internal datasets, so can't provide them to reproduce, however they are a bunch of ~50 col x ~40mm row dataframes

rchiodo commented 5 years ago

~50 col x ~40mm row dataframes

Does this mean 40 million rows?

kdkavanagh commented 5 years ago

Yes

mgsnuno commented 5 years ago

After some research I ended up here: https://github.com/ipython/ipython/issues/10493#issuecomment-392549088

It improved things on my side so far, still testing.

Note: I have Jedi disabled, using Microsoft Python Analysis Engine. In plain IPython, I consistently had to wait a while after loading a shapefile with geopandas, and not a big one: map.head() would take a while. The same geopandas loading in VSCode Interactive was fine, but map.head() after would halt. With forcing Jedi to off, seems fine on both cases.

rchiodo commented 5 years ago

@mgsnuno thanks a lot. We might have to disable jedi by default then with the jupyter autocompletion.

greazer commented 5 years ago
mgsnuno commented 5 years ago

@rchiodo glad to help!

Question for anyone: can you point me towards the code where vscode interacts with IPython (for curiosity)?

Also, if there is an overhead of using IPython through vscode, what would be the main cause of it?

rchiodo commented 5 years ago

Question for anyone: can you point me towards the code where vscode interacts with IPython (for curiosity)?

The code where we interact with jupyter is here: https://github.com/microsoft/vscode-python/tree/master/src/client/datascience/jupyter

The classes in that folder all interact with jupyter in some way.

Also, if there is an overhead of using IPython through vscode, what would be the main cause of it?

There shouldn't be any difference with using IPython through VS Code than using it with a jupyter notebook. AFAIK we're making requests the same way and parsing the results the same way as classic jupyter. There might be a slowdown in our UI as far as parsing the results, but that would usually require a lot of HTML to be returned (unlike say df.head())

mgsnuno commented 4 years ago

From my last month experience (I've been updating to the new releases asap) it's not only tied to large dataframes but also to the number of cells.

I often keep the number of display cells to a minimum by using Remove all cells still when the cell number is high (sometimes is > 50, other times > 100, maybe here the variable size matters), then execution becomes slow (I do shift+enter and it takes a while for code to appear in interactive window) and then to be executed (even a = 1).

I have variable explorer disabled.

mgsnuno commented 4 years ago

One consistent issue: because of the undo/redo option, Remove all cells doesn't actually delete the cells. After some hours work, the notebook behing Python Interactive can have 100's of cells, even though we keep resetting the cell number with Remove all cells. This is a clear cause of slowing to a halt, specially if the cells removed were plots.

It would be nice for remove cells to actually remove the cells.

rchiodo commented 4 years ago

@mgsnuno the way to 'remove all cells', and not just in the output, is to restart the kernel. There's no other way for us to remove the memory from the kernels's python process.

Ddedalus commented 4 years ago
* In addition, we could potentially add telemetry to get an idea of the average/median size of datasets.

@greazer I would be careful gathering info about people's datasets as this could get touchy very quickly.

I've run into the problem of the virtual window slowing down enormously as well. I actually build a test script to emulate various workflows and thus help find the cause.

What I found:

You can find the script I used here: https://gist.github.com/Ddedalus/9c789a9a1326642f4eb909bd0683df52

I let it run and see how many iterations it takes before the cells loose sync with what is being typed.

rchiodo commented 4 years ago

@Ddedalus how many times does it take? I'm trying it out by doing this:

# %%
i = 0

#%%
%%time
exec("var{}=[{} ** 10 % 17 for l in range(100000)]".format(i, i))
i = i + 1

And after running the second cell 150 times I'm not noticing any slow down. It's almost always 3ms

Now if the variable explorer is open, it can cause a refresh delay, but I believe you're describing a situation where just normal execution takes longer and longer

Ddedalus commented 4 years ago

@rchiodo interesting, I don't get the slowdown running your code, the same cell 150 times. My script starts generating noticeable lag after 60-100 iterations, though. I notice a bit of lag after 300 iterations of yours, but this is still much better. The time reported by %%time is approximately constant, as usual.

mukerong commented 4 years ago

I have similar issues. When I initially starts Python Interactive windows and run code through VSCode editor like the following, everything is very fast. image

However, after around 100 runs, it becomes very slow. It seems to me that the Interactive Window needs ~5s to respond to get the code from the text editor. The direct run section is as fast as usual. image

I also noticed that df.dtypes is hugely slow. It takes ~10min to return the results. I can get the result instantly after running it and interrupted the kernel. Without any interruption, it will take ~10min to return the results. image

rchiodo commented 4 years ago

@mukerong Can you try using %%time when it's slow? Does it report a slowdown too?

And if you can, some sample repro code would be great.

mukerong commented 4 years ago

@rchiodo

I tried %config IPCompleter.use_jedi = False and it fixed the slowness of df.head and df.dtypes. Does Jupyter automatically use Jedi for auto-completion? Can we make it to be the default setting? I don't have jedi enabled in VSCode. image

I'm running code around line 130 now. Every time I hit run cell, the Python Interactive will wait for 5 seconds before it react to show the running process like below. I have to restart VSCode to bring back the speed. The code running time is about the same but 5 seconds delayed on each cell can easily add to 5-10 min for big files. image

rchiodo commented 4 years ago

I tried %config IPCompleter.use_jedi = False and it fixed the slowness of df.head and df.dtypes. Does Jupyter automatically use Jedi for auto-completion? Can we make it to be the default setting?

Yes Jupyter does use Jedi automatically. Not sure we want to turn it off by default because it might mean that Jupyter then doesn't have ANY intellisense completion. We're still investigating.

I'm running code around line 130 now. Every time I hit run cell, the Python Interactive will wait for 5 seconds before it react to show the running process like below. I have to restart VSCode to bring back the speed. The code running time is about the same but 5 seconds delayed on each cell can easily add to 5-10 min for big files.

Can you share your file? And is this slowdown only when running from a python file? or does it happen with the input box too? And is the slowdown still there after disabling Jedi in Jupyter?

mukerong commented 4 years ago

Yes Jupyter does use Jedi automatically. Not sure we want to turn it off by default because it might mean that Jupyter then doesn't have ANY intellisense completion. We're still investigating.

Does the Jupyter inside VSCode use Jedi? I though it will use Microsoft Python Engine after I disabled it in VSCode.

I'm running code around line 130 now. Every time I hit run cell, the Python Interactive will wait for 5 seconds before it react to show the running process like below. I have to restart VSCode to bring back the speed. The code running time is about the same but 5 seconds delayed on each cell can easily add to 5-10 min for big files.

I am not sure if it makes sense for me to share the file because the slowness might be caused by the data. Will disable the variable preview help? If so, what is the best way to disable it? image

Yes. The slowdown only happens when running from a python file. Running from the input box does not cause slowness.

Yes, the slowdown is still here after disabling Jedi. It is better than not disabling but can still experience the slowness.

rchiodo commented 4 years ago

Does the Jupyter inside VSCode use Jedi?

Unfortunately Jupyter's use of Jedi is completely independent of VS code. It knows nothing about the Microsoft Language Server.

Running from the input box does not cause slowness.

Thanks this helps. Must be something to do with having an actual file then (input box doesn't pass a file name)

hauselin commented 4 years ago

Just want to add I'm having the same slowing issues as @mukerong.

Even with tiny datasets (e.g., iris), I can visibly notice the slowing after 10-20 minutes of running code from the editor. The longer I've been using it, the slower it gets... But when I run the same code directly from the input box in the ython Interactive window, it's still super-fast (as good as REPL). Also, restarting my kernel doesn't work—the only way for me to fix the slowness is to restart vscode entirely.

Going forward, I'll probably use REPL instead of the Jupyter/Python interactive until this issue is resolved... Hope this gets addressed soon and thanks for helping us, @rchiodo!

rchiodo commented 4 years ago

@hauselin did you try this?

%config IPCompleter.use_jedi = False 
hauselin commented 4 years ago

Yes, @rchiodo, I tried that a few weeks ago. I think it's helped a bit so far but the slowing is still quite apparent after using vscode for just 10-20 minutes and it slows down more the longer vscode/python interactive has been opened (even when I'm not actively using it). I also don't think it's caused by other extensions because I disabled all other extensions a few days ago and the problem persisted.

mukerong commented 4 years ago

@rchiodo, @hauselin, I have same issue. Jedi can fix it in short-term, but if I have been editing in the same window for >20min, it will become slow regardless of the size of data.

hauselin commented 4 years ago

@mukerong, I recently switched to REPL and it's been a much more pleasant experience, though it means I don't get to use jupyter features like cell chunks anymore. But there's no slowing at all.

rchiodo commented 4 years ago

This is hitting more people. Moving back to triage. We'd have to profile the kernel to see what's so slow. Might also be dependent upon the version of IPython the user has installed.

mukerong commented 4 years ago

@hauselin, do you run it within VSCode command line or just a normal command line tool? Maybe I should try to use iPython in command line.

hauselin commented 4 years ago

@mukerong, I'm using vscode's terminal. I simply changed my keyboard shortcuts so that when I run the current line or selected text in the editor, that code is sent to the terminal (see screenshot) instead of the python interactive window. Works very well for me so far, so for people who are experiencing slowing when using the interactive window, it might be worth reconfiguring your vscode settings slightly to work this way until the problem's resolved...

image
rchiodo commented 4 years ago

We have partially resolved this bug. Some of the slow down here was in our generating of code lenses after the user opens a lot of files. That will be shipping in our upcoming release.

The problem that the %config IPCompleter.use_jedi = False resolves is still open though.

hauselin commented 4 years ago

Thanks for the updates! I'll try it when it's released.

mukerong commented 4 years ago

Thank you! Will try as well.

I'm running sample code from Hands-on Machine Learning book, and experienced a severe slowness might due to matrix plot. Will try it again after getting new version.

On Tue, Apr 21, 2020 at 12:25 PM Hause Lin notifications@github.com wrote:

Thanks for the updates! I'll try it when it's released.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/vscode-python/issues/7180#issuecomment-617365331, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQ3ILR5WICE5AIBFJWMVODRNXXLXANCNFSM4ITKINAQ .

hauselin commented 4 years ago

Everything has been working fine and I haven't experienced slowing so far and will update if it starts slowing again. Thanks for fixing it, @rchiodo!

mgsnuno commented 4 years ago

Also fine now here. Thank you!

davins90 commented 3 years ago

unfortunately I found this problem today... it's a big mistake from Microsoft something like this!

rchiodo commented 3 years ago

@davins90 what exactly is the problem you're having? A slowdown? Did you attempt the workaround?

davins90 commented 3 years ago

hi @rchiodo thanks for the answer! Since today every time a run a cell (from df.shape to complex ones) it takes around 10 minutes to show the results. Unfortunately i don't have found the workaround that you suggested. As you can see from the image i think i don't have that option. image

rchiodo commented 3 years ago

The workaround is to add this to your "python.dataScience.runStartupCommands": "%config IPCompleter.use_jedi = False"

The use of Jedi is not within VS code, but within the jupyter kernel. Jupyter is the one being slow here and the workaround prevents jupyter from using Jedi to perform autocomplete. This also means it should require you typing into the interactive window to cause the problem.

In the future we'll fix this a different way, but for now it should fix the problem. Let me know if it doesn't.

davins90 commented 3 years ago

thanks @rchiodo , i really appreciate your precious help. I add in the json settings the string you suggest, but vscode warns me in that way image Is it ok anyway?

rchiodo commented 3 years ago

Sorry it should be like this. I forgot we changed it to an array

  "python.dataScience.runStartupCommands": [
   "%config IPCompleter.use_jedi = False"
 ]
davins90 commented 3 years ago

thanks again, unluckily it doesn't work anyway. It works only if i interrupt the kernel and run all the cells. What do you think could be the problem? In this section i have this situation image

In this instead i have this one: image

rchiodo commented 3 years ago

What exactly is slow? Execution of cells or the UI? Does a simple print('foo') also run slowly?

davins90 commented 3 years ago

print('foo') runs ok, but if i do also a simple df.head() 3 times runs ok the 4th takes still 5 minutes

rchiodo commented 3 years ago

Where are you typing df.head()? In interactive window at the bottom or in a text file?

rchiodo commented 3 years ago

And did you restart VS code after setting the startup commands?

davins90 commented 3 years ago

yes, i've restarted and i run in a cell of a juptuer file

rchiodo commented 3 years ago

If you run %config IPCompleter.use_jedi = False in the first cell and try again does it repro? Your bug sounds like the Jedi completion problem, so the workaround should be preventing it from happening.

davins90 commented 3 years ago

i run the line as you suggested and nothing changed. Then i remove it, restart, check if "Microsoft" is set as language server and run again. It now works for 6 time, but at the 7th appears the slow

rchiodo commented 3 years ago

The Microsoft setting for the language server has no bearing on this bug. This bug has to do with what Jupyter is doing, not VS code. The Microsoft setting is what language server VS code will use to provide intellisense.

You can also try this value:

%config IPCompleter.use_jedi = True
%config IPCompleter.jedi_compute_type_timeout = 0 

If you type

%config IPCompleter

into a cell it should show all of the options Jupyter has for controlling autocomplete. That should be what's causing the slowdown.

Does the problem occur if instead of typing df.head() you just run a single cell with df.head() in it over and over again?

(I'm trying to determine if it's really intellisense related or something else that is slowing down the kernel. Typing the . character will cause a request to Jupyter for the autocomplete list. That should be what's slowing you down but if the slowdown is random when just running cells, it might not be).