Unique kernel IDs - Githubissues

takluyver commented 4 years ago

This came up on #42, and we wanted to spin out a separate issue. Questions include:

What part of the code should be responsible for assigning a kernel ID? The application? JKM? The kernel itself?
Are IDs persistent across restarts? What about if you checkpoint/restore a kernel process?

This is also going to play into a larger question about kernel discovery; I'll open a separate issue about that.

@kevin-bates responded to some of my questions already on #42:

Should a restarted kernel have the same ID as the original?

I think we must retain the kernel id on restarts. The MappingKernelManager relies on this fact - otherwise it would remove the current slot and create a new slot for the new kernel manager. In addition, I believe kernel providers should be able to do what they want within their kernel manager, but the kernel-id, IMO, is the public key for identifying a kernel throughout its lifetime (where restarts are considered within its lifetime).

I think I mentioned this before, but I believe KernelFinder.launch() should take an optional kernel_id=None parameter. This parameter would also be honored on initial launch as well and the restarter would then use it during restarts: self.kernel_finder.launch(self.kernel_type, cwd, kernel_id=self.kernel_manager.kernel_id). (We'd want to extend KernelManager's initializer in a similar manner - with an optional kernel_id parameter.)

What about if you can checkpoint/restore a kernel process - should the restored process have the same ID? What about if you restore the same checkpoint to two processes?

If these questions are directed at persisted sessions that get "revived" I think the id of the process is orthogonal to the id of the kernel. When/how is process id used by clients?

In EG, when a persisted session is "revived", for example if the server came down, and another started, EG does not restart the kernel process, but instead, re-establishes connection to the kernel process. It can do this because the kernel is remote and is (probably) still running.

If we want the server to be able to support active/active scenarios (which we plan to do for EG), retaining the kernel's id across restarts is paramount to that - otherwise restarts would require communicating the old and new kernel ids to other servers, etc. and I don't think we want to go there - especially since the current behavior is to retain the id.

Who should be able to discover what kernels? What part is responsible for advertising them? When should kernels be run without being advertised?

I guess I'm not familiar enough with the direct kernel application approach. For applications like notebook and jupyter-server, the list of running kernels should come directly from the MappingKernelManager. Once (well, if) we add the notion of users and roles, we could then apply filtering on the results of the /api/kernels request.

takluyver commented 4 years ago

Yes, I like the idea of having a kernel_id=None parameter, where None means 'generate a new UUID'. We've already got a kernel_id attribute on the manager object, and the subproc launcher sets it to a new UUID each time. So we 'just' need to add support for passing that in through the provider.launch() method.

The answer to my question 1 would therefore be: JKM typically picks the ID when a new kernel is started, but the caller code can specify an ID (e.g. if it's restarting a kernel and wants to keep the same ID).

@kevin-bates , as this is a topic that interests you, do you want to make a PR? Or we can discuss it more - I'm not sure that I've thought through all the details and implications yet.

kevin-bates commented 4 years ago

JKM typically picks the ID when a new kernel is started, but the caller code can specify an ID (e.g. if it's restarting a kernel and wants to keep the same ID).

I think you meant to say that the provider picks the ID since the kernel manager returned from launch is used by the builtin providers.

I know of some projects that want to be able to specify kernel-id on initial starts as well and so I think we can generally say that a specified kernel_id will be honored or starts or restarts.

One concern I have is that because (I believe) restarts should use the same ID (as is the case today), I think providers are going to need to know (or should be informed so) that launch() is being called for the purpose of restarting the kernel. This would allow them to adjust/reset/etc resources as they deem sufficient. As a result, I'm wondering if the interface wouldn't be cleaner to also create a restart() method. Of course, that would imply the provider should stop the kernel (if need be), so perhaps letting JKM own the restart logic is best (both auto-restart and manual). In that case, I suppose a restart boolean to launch() is the best option.

I don't mind creating a PR for this once the dust has settled.

takluyver / jupyter_kernel_mgmt

Unique kernel IDs #43