jupyter / notebook

Jupyter Interactive Notebook
https://jupyter-notebook.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
11.62k stars 4.88k forks source link

Could MultiKernelManager connect to kernels started by a different process? #1786

Open danielballan opened 8 years ago

danielballan commented 8 years ago

When a request is made to api/kernels/<kernel_id>, the MultiKernelManager looks in its in-memory cache of known sessions. If it doesn't know about that session, it errors. I am wondering if, before erroring, it could check the on-disk cache of running kernels (e.g., in ~/Library/Jupyter/runtime). If it find the kernel_id in question, it would start a new session connected to that existing kernel. To say it another way, I see that the cache of kernels known to the manager includes only those initiated by that particular server process, not any kernels started by some other processes. I'd like the manager to check for kernels started by another process when it doesn't find one of its own matching the requested ID.

Would that be a crazy thing to do? The goal is to enable something like ipython console --existing <kernel_id> in the notebook server.

For example, suppose Alice and Bob are running separate notebook servers on the same machine. Alice wants to allow Bob to observe the inputs and outputs in a console or notebook that she is working on. She sends Bob the connection file info (perhaps via a hub extension). Bob opens a console and creates a new session connecting to that running kernel. Now Alice and Bob each have a session connected to the same kernel. Since the JupyterLab console now mirrors iopub messages from any session, not just the session that initiated execution, Bob can see Alice's inputs and outputs. He can also execute code. This is one simple mode of real-time collaboration.

takluyver commented 8 years ago

I think that's a reasonable thing to do, although at present I think the kernel IDs it uses are only stored in memory. The main reason it doesn't, as far as I know, is that we haven't come up with any UI we like for connecting a notebook to an existing kernel.

jasongrout commented 8 years ago

How about instead of relying on finding a connection file, we make the /api/kernels POST request (which normally starts a kernel from a kernelspec name) take an optional kernel connection file contents as data? Or we could have a new endpoint that connects to an existing kernel, something like /api/kernels/{kernel_id}/connect (where we have to provide a made-up id). Or perhaps we reserve the name 'connect' in the kernel_id spot: /api/kernels/connect.

danielballan commented 8 years ago

Yes, that does sound better. Going through a connection file wouldn't add any value. Does anyone have a clear preference for one of those URL schemes?

minrk commented 8 years ago

One disadvantage is that, until the kernel nanny arrives, this will mean that interrupt, restart actions will be unavailable.

Carreau commented 8 years ago

If it doesn't know about that session, it errors. I am wondering if, before erroring, it could check the on-disk cache of running kernels (e.g., in ~/Library/Jupyter/runtime). If it find the kernel_id in question, it would start a new session connected to that existing kernel. To say it another way, I see that the cache of kernels known to the manager includes only those initiated by that particular server process, not any kernels started by some other processes.

Many people inadvertently start many servers as they think "Oh jupyter-notebook just open the browser" falling back on reading files will create weird conditions when multiple server connect to same kernel, same file hence why we don't do it.

If we go this route I would suggest that the server write a file saying I'm this PID , on this host I "own" the file. And once another server do the file lookup, it check whether the said process is still running.

Would that be a crazy thing to do? The goal is to enable something like ipython console --existing in the notebook server.

We had a quick discussion with Fernando some time back. We think that jupyter console --existing should bring a UI (in general, if is tty) that allow user to select console. Regardless of whether it's ran from the server.

For example, suppose Alice and Bob are running separate notebook servers on the same machine. Alice wants to allow Bob to observe the inputs and outputs in a console or notebook that she is working on. She sends Bob the connection file info (perhaps via a hub extension). Bob opens a console and creates a new session connecting to that running kernel. Now Alice and Bob each have a session connected to the same kernel. Since the JupyterLab console now mirrors iopub messages from any session, not just the session that initiated execution, Bob can see Alice's inputs and outputs. He can also execute code. This is one simple mode of real-time collaboration.

I'm sure almost no-one will understand the security risk of that. If bob connect from another session if should likely get a connexion though some other kind of connexion (Fake kernel connexion infos ?) that make it clear he is not Alice. In general I think a better (and more complicated) way of having collaboration would be for Alice and bob server to make a handshake, and for Bob server to see remote kernels "exposed" by Alice server.

Also Alice and Bob likely don't have the same signature key for messages, so should sharing kernel.json files requires Alice to give bob access to sensitive info ?

danielballan commented 8 years ago

Thanks for explaining, @Carreau. I see that a handshake between servers makes more sense. Is this part of the planned work on addressing collaboration for JupyterHub v0.8, @minrk? I'm interested in helping / offering BNL users as guinea pigs.

minrk commented 8 years ago

Also Alice and Bob likely don't have the same signature key for messages, so should sharing kernel.json files requires Alice to give bob access to sensitive info ?

Each kernel has a key (in the connection info), so if you can talk to a kernel (send or receive), you have the key. There is currently no mechanism to receive messages from a kernel without being able to also execute on that kernel.