Kernel instance discovery (discussion)

takluyver commented 4 years ago

How should discovery of running kernel instances work? (See also #43, which we'll probably want to tackle first.)

At present, most kernels are started by the launcher writing a local connection file. This is used to start the kernel, but then it also works as a crude way for other applications on the same system to discover and connect to the kernel.

But if a kernel is launched remotely, there may not be a connection file on the launching system for it. And I'd like to redesign the start method for local kernels, which probably won't require a connection file at all. Since #42, jupyter-kernel writes its own connection file intended for clients to connect to the kernel it has launched, even if the launch didn't create a connection file. But this is an inelegant solution.

What kernels should applications be able to discover (besides the ones they have launched themselves)?

Any kernels running locally? (Under the same user account)
Any kernels launched by local applications? (which may include kernels running remotely or in VMs/containers)
Any kernel instance it could have launched itself? (i.e. tied to available kernel type providers)
Or even broader? E.g. if I'm on a cluster, should it be possible for a Jupyter application on one node to discover and connect to a kernel on another node, using some shared resource? What about zeroconf discovery on local networks?
Maybe kernel instance discovery should be pluggable, so people can experiment with different strategies for this kind of thing? Extensibility adds complexity, though.
Should the application launching a kernel be able to decide whether other applications can find it, or whether it should be 'private' to that application? If so, should the APIs for launching a kernel default to private or shared.

kevin-bates commented 4 years ago

I consider this solely an application issue. Notebook-based servers uses the MappingKernelManager for this. In addition, "extensions" like Enterprise Gateway that introduce multi-tenancy, do the same. Those kinds of applications should also associate other information with each kernel such that per-user queries can be performed - since you don't want another user to necessarily know someone else is running a kernel.

Regarding locality of the kernel relative to this discussion, I think it should reflect all kernels - local, remote, vm, etc. But since my primary argument is that this be performed by the application, that too would be an application decision.

Thomas, I'll defer to you for applications other than Notebook, JS, EG. I think having the ability to list kernels that KernelApp has started would be useful, but I think something like that could simply post entries into RUNTIME_DIR (and prefixed with its own string) like many *nix applications do.

I think leaving JKM as-is for this is the right choice. JKM provides facilities for the discovery and launch of individual kernels. Management of launched kernels should be the responsibility of the application since only it knows how best to convey launched kernel info, if at all.

takluyver commented 4 years ago

To be clear, I'm thinking about how applications can discover kernels that some other independent application started. The notebook server by default does not do this at all: it only uses the kernels it has started itself. The Qt console and jupyter_console do it in a very limited way (the --existing flag).

kevin-bates commented 4 years ago

I see. Sorry, my focus is purely from applications like Notebook, JS, EG where it's clear (in my opinion) what the behavior should be.

So I think this discussion only applies to those kernel providers that create a connection file. If a different application wants to use an existing kernel, I suspect it would need to know the kernel's id (is that true?) and, if that's the case, then if we stated that all connection filenames be of the form of kernel-<kernel_id>.json and those files reside in a well-known sub directory of jupyter-runtime-dir (or itself), then those "external" applications, could just discover the available connection files. I agree that relying on a UUID as your "human identifier" is weak, but unless there's other metadata in the connection file (like kernel name, launch time, etc.), I don't see another solution.

You mentioned somewhere about reworking things so that no connection files are used. In that case, how would you propose discovery of launched kernels in non-launching applications? I really don't want to get into persistence to a shared location, but it seems like some form of that would be required (which, if you could always assume local kernels, the connection file approach addresses).

takluyver / jupyter_kernel_mgmt

Kernel instance discovery (discussion) #44