microsoft / vscode-jupyter

VS Code Jupyter extension
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter
MIT License
1.27k stars 284 forks source link

Proposal for proxy kernels #9076

Closed kieferrm closed 1 year ago

kieferrm commented 2 years ago

vscode.dev does not have any compute available, thus there is no Jupyter kernel available when opening a notebook. Our current approach is to tell users that they need to continue their work in a codespace or locally. A few extensions like the pyodide extension provide in-browser Jupyter kernels. If the user has one of them, we use them. The notebook then does not have any trigger point for "Continue on...".

If a user only uses notebooks (rather than notebooks and script files) the "Continue on..." motion seems too heavyweight. It would be nicer to have a proxy kernel available that when chosen spins up a remote Jupyter kernel. The Jupyter extension already supports remote kernels. Thus, once the remote kernel is up and running, the same functionality as in the Jupyter extension is at play.

With our current notebook API, an extension author can create a notebook controller that spins up say a container with a running Jupyter server on demand. However, it seems that they have to duplicate functionality of the Jupyter extension. What seems appealing is to think about the Jupyter extension providing an extension mechanism for proxy kernels. Potential proxy kernel backing technologies I could think of: devcontainers, codespaces, preconfigured VMs, etc...

kieferrm commented 2 years ago

/cc @roblourens @rebornix @srivatsn

rebornix commented 2 years ago

Potential proxy kernel backing technologies I could think of: devcontainers, codespaces, preconfigured VMs, etc...

This might be discussed already: it would also be convinient that the Jupyter extension can help setup everything when a vanilla VM (or maybe just an ssh connection) is provided, it installs jupyter and dependencies for us and connects automatically, like how Remote SSH works behind the scenes.

rchiodo commented 2 years ago

What seems appealing is to think about the Jupyter extension providing an extension mechanism for proxy kernels.

I'm not understanding this part? What would the extension point be? Do you mean VS code extensions reusing the Jupyter extension to say have their notebook controller running in the container created by jupyter?

greazer commented 2 years ago

@rchiodo, The experience isn't completely fleshed out yet, but the idea overall is to make it as simple as possible for users to connect to a compute of their choice, provided by some compute service provider and start a running kernel with a pre-selected environment.

So the intent of the 'extension for proxy kernels' refers to the web-enabled Jupyter extension presenting what look like 'ordinary kernels' (they show up in the kernel selection list), but when started, they don't just spin up a Python kernel/interpreter, they do whatever they must do to:

  1. Find and allocate a compute somewhere (local or remote)
  2. Take care of any special authorization or selection needs.
  3. Establish a running kernel environment and Jupyter server based on information found in some well-established environment specification (ala requirements.txt, devcontainer.json, etc)
  4. Let the user know what's happening either directly or through a communication relay through the Jupyter extension until the kernel is spun up and ready to go.

Example: when the user clicks on the kernel picker they could be presented with a list of kernels like this. Assuming the user is running vscode.dev:

Pyodide Web Kernel (assuming it's installed)
GitHub Codespace
Azure ML Compute
Local Container

The "Local Container" would talk to a separately installed agent on the user's local machine to spin up an appropriate container.

I'm sure there are plenty of details to hash out, but this is the basic idea.

rchiodo commented 2 years ago

Thanks the use case is much clearer now.

So the extensibility model is around adding things to that list.

SiddhanthUnnithan commented 2 years ago

This is an interesting concept. @kieferrm I'd love to know how we're thinking about mixed contexts in terms of VS Code components. If my notebook is connected to a remote kernel, how do I resolve operations such as "pd.read_csv()" where refers to a file that I have in my local workspace (independent of whether that is local VS Code or the web client). Similarly, does the remote kernel concept mean that only the Notebook is "remote", whereas the rest of VS Code may be "local". If the user tries to open a Terminal, use the debugger, use extensions (e.g., Pylance) with the Notebook then how do they reconcile that those will only work locally and with Notebooks connected to local-kernels?

I'm asking the above because we tried a similar approach with Azure ML compute instances -- we streamlined connecting users to a remote Jupyter server (and thus remote kernel) from a local Notebook. Users were pleased with the streamlined approach (no auth, no network config) but got easily confused with local vs. remote context boundaries. They weren't sure how to access data they had in the local workspace from the remote-connected Notebook; they also wanted to use the Terminal and debugger to work with their Notebooks but were unsuccessful because those were "local" while their Notebook remained "remote". This is why we shifted to a full-fledged remote connection to their compute instances where there is a single remote context for the users to work in.

DonJayamanne commented 1 year ago

Closing this for now, as we've been exploring other options (commands and kernel sources).