microsoft / vscode-jupyter

VS Code Jupyter extension
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter
MIT License
1.3k stars 294 forks source link

Strange bug with large notebooks #16211

Closed ale-dg closed 2 weeks ago

ale-dg commented 2 weeks ago

Hi all,

I have been having this strange issue/bug when displaying graphs making the notebook using a lot of MB.

Say I use either Altair or Plotly in interactive mode (which naturally makes the notebook use a lot of memory), the issue in the screenshot at the end happens. When I set both to render in SVG format, the same happens (we know SVG is heavy). When I set them to render in PNG, the issue goes away (and naturally, the notebook becomes quite light).

Something I have been noticing is that when plotting in MIME/interactive mode and SVG, the notebook uses almost 250 MB, whereas in PNG mode it goes to around 10 MB (see below comparison).

Hope you can take a look into this

Best

In SVG mode:

Image

in PNG mode:

Image

Disk usage comparison:

Image

VS Code: Version: 1.95.1 (user setup) Commit: 65edc4939843c90c34d61f4ce11704f09d3e5cb6 Date: 2024-10-31T05:14:54.222Z Electron: 32.2.1 ElectronBuildId: 10427718 Chromium: 128.0.6613.186 Node.js: 20.18.0 V8: 12.8.374.38-electron.0 OS: Windows_NT x64 10.0.22631

Jupyter: v2024.10.0 Jupyter Notebook Renderers: v1.0.21 Python: v2024.18.0

DonJayamanne commented 2 weeks ago

Thank you for filing this issue, I do not believe this is an issue with Jupyter extension itself. Its most likely due to the sheer size of the data (output) returned , which ends up being part of the notebook.

ale-dg commented 2 weeks ago

Hi @DonJayamanne

Thanks for checking. I've run it in jupyterlab and indeed goes back to ~240 MB (see below from jupyter). Although, it does render the plots and does not show the strange behaviour of VSCode, however sometimes collapses due to the "heavy" rendering of the images. It's strange though: the DF is only 100,000 rows which I guess is not that much since I have worked with so much more data and it had worked fine. I guess I'll have to keep working with PNG.

On the other side, I am using Spark/PySpark, which also uses Java/JVM on the backend and might be using a lot of resources from the computer (even though I am using a gamer one). Since both VSCode and Jupyter use Electron and JS respectively, I am not sure if it would cause any issue there. Would it be the case? If so, I guess a workaround would be needed since the point of using PySpark (at least in Data Science) is to navigate around millions of data points and tenths or hundredths of columns, and sometimes using samples is not possible.

Or just keep plotting in PNG 😆

Best

Image

DonJayamanne commented 2 weeks ago

Thanks for checking. I've run it in jupyterlab and indeed goes back to ~240 MB

Given that the size is large even Jupyter lab it’s not just an issues in vscode Closing this as it’s not specific to vscode