microsoft / vscode-jupyter

VS Code Jupyter extension
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter
MIT License
1.27k stars 280 forks source link

Share Linter and IntelliSense information across multiple Notebooks #15866

Open Zenome84 opened 1 month ago

Zenome84 commented 1 month ago

Context & Problem

I work with datasets that are quite large, and generally take a long time to load into memory. When I am conducting feature engineering with notebooks, I have to work across the spectrum of:

  1. creating several transformations
  2. testing them on different ML models
  3. comparing variations of (1) and (2)
  4. temporarily scraping work/results that aren't useful (maybe now, but could be useful in the future)

After a day or two of these, my notebooks become large, complex, and incoherent. I find myself looking for something I did several rounds of analysis ago, and because I can't find it, I attempt at rewriting it, adding to even more code in my notebooks.

Possible Solution - but still not perfect

I split my work into several notebooks, and have all the notebooks connect to the same Jupyter kernel:

Now I can more easily track my work, but what makes this solution less than perfect is that none of the linting and intellisense features will work across notebooks - i.e. I will not have autocomplete work and I will get warnings that variables/modules have not been declared/imported.

Feature Request - a more perfect solution

I would like linting and intellisense to work across all notebook files contained in a folder. For example, all notebooks will parse through the imports notebook and be able to understand that I have numpy imported as np and then autocomplete np... even though I did not do this import in a previous cell.

I imagine creating some hierarchy file in the folder that specifies which notebooks can access which other notebooks to understand declarations.

Reasons Not to Bother with Adding this Feature - and why they are invalid

  1. Why not split each combination of analysis into separate notebooks, with all full imports, data loading, preprocessing, etc. Besides being repetitive in code, it is an inefficient use of memory/compute resources and my time. For each variation, I need to wait for my machine to process all necessary cumulative prior steps. This locks up resources.
  2. Why not save data files at each step. This is a variation of the not so perfect possible solution. It is imperfect, because when dealing with large datasets you still need to load the data and also have some place to store the data. It also requires having a very good system of data versioning, and this is simply not practical for the exploratory stage.

Why is this Feature a good Foundation for Future Features I can see this leading to the ability for analysts/scientists to more easily develop books with chapters - and the hierarchy file serving as a table of contents.

rebornix commented 1 month ago

@Zenome84 great feature request and thank you for providing all the details and reasoning. It makes good sense to me. Long/large notebooks can quickly become hard to manage and splitting the work into multiple notebooks is a good idea.

We have similar ideas on how to improve this, i.e., support sharing kernel between notebooks or even implement scratchpad/repl for notebooks, thus users can use the scratchpad or secondary notebook for experimenting but use the main notebook to capture the validated ideas.

Regarding linting/intellisense, currently language extensions don't have this concept. They can handle single notebook (technically multiple cell text document in the same notebook), but it would be an architecture change to understand multiple notebook can share the same state. With that said, it's worth brainstorming and see how we can improve this.

Zenome84 commented 1 month ago

@rebornix

We have similar ideas on how to improve this, i.e., support sharing kernel between notebooks or even implement scratchpad/repl for notebooks, thus users can use the scratchpad or secondary notebook for experimenting but use the main notebook to capture the validated ideas.

So sharing kernel between notebooks is already supported through vscode, a feature that I stumbled upon by mistake. I have found it indispensable.

Regarding linting/intellisense, currently language extensions don't have this concept. They can handle single notebook (technically multiple cell text document in the same notebook), but it would be an architecture change to understand multiple notebook can share the same state. With that said, it's worth brainstorming and see how we can improve this.

This is in fact the feature that would complement sharing a kernel between notebooks amazing. I am not versed in the architecture nor have I every built extensions, but as you say, linting/intellisense does work across cells in the same notebook (albeit sometimes buggy) and could be some place to work from

Also, when importing from other py files, linting/intellisense also works in the standard way expected in python. This could be another direction perhaps, like importing the namespace of another notebook. However, I have not heard of such a concept.