New `bg: bool` entry in shell channel message content

davidbrochart commented 2 years ago

Problem

Currently, messages on the shell channel are handled in the main thread. This guarantees that e.g. execution requests are run one after the other, in the order they are received. While this is what we want when e.g. executing the cells of a notebook, it also prevents widgets from executing in the background.

Proposed solution

Create a new bg: bool entry in a message content on the shell channel. A message received with bg: true by the kernel will be handled in another thread, in parallel with other messages received on the shell channel.

Additional context

akernel does this by default for execution requests, but with an asynchronous concurrency model: a cell will be executed in the background if it is cooperative (i.e. it awaits at the top-level). This proposal differs by using a threaded concurrency model and being more explicit: parallel execution must set bg: true. Widgets could take advantage of it when sending their comm_msg. For general code execution, we could maybe use e.g. a % bg magic on the first line of a code cell, or even run the cell normally and have a JupyterLab "Send cell execution in the background" button, similarly to a Unix shell Ctrl-z -> bg.

davidbrochart commented 2 years ago

To illustrate, this is a POC where all cells are executed with bg: true. The widget runs in parallel with some CPU-bound code in another cell:

https://user-images.githubusercontent.com/4711805/172254781-54a3d8ec-ef89-4c31-9029-235bc64d73d6.mp4

jasongrout commented 2 years ago

Create a new bg: bool entry in a message content on the shell channel

This sounds like it might be better in the metadata? i.e., it's independent of the message type/content, but is metadata telling the receiver how to process the message.

Widgets could take advantage of it when sending their comm_msg.

The problem with always setting this is that comm messages from widgets may invoke arbitrary code. Perhaps we can have a widget model trait that controls setting this bit, rather than assuming that it is what the user wants for all widgets?

IntSlider(background_updates=True) or something...

jasongrout commented 2 years ago

Is the proposal here that each shell message with this bit set will have a new thread created to handle it?

davidbrochart commented 2 years ago

This sounds like it might be better in the metadata? i.e., it's independent of the message type/content, but is metadata telling the receiver how to process the message.

Maybe, unless we want to restrict bg to some message types. I'm not sure it's relevant for messages that can't actually execute code in the kernel, i.e. anything else than execute_request or comm_msg. An execute_request message has stop_on_error in content, I could see bg in there too.

IntSlider(background_updates=True) or something...

Yes, that's exactly what I had in mind.

davidbrochart commented 2 years ago

Is the proposal here that each shell message with this bit set will have a new thread created to handle it?

I don't know, that might be a lot of threads created e.g. when moving a slider. Also, we might want to ensure that e.g. widget messages are handled in order for a given widget. So I was thinking that we could create a new thread for each comm_id, and use it for all messages with this comm_id. If we allow this flag for execute_request, then I guess a new thread for each request is fine?

jasongrout commented 2 years ago

Maybe, unless we want to restrict bg to some message types. I'm not sure it's relevant for messages that can't actually execute code in the kernel, i.e. anything else than execute_request or comm_msg.

One thing we're doing at Databricks is sending some messages on the control channel specifically because we want parallel execution - for example, we don't want autocomplete blocking on main thread execution. Thanks to your work putting control channel processing on a separate thread last year, this works with ipykernel (since ipykernel accepts any shell message on the control channel). This background metadata attribute allows for a similar outcome without sending messages on the control channel.

It does sort of beg the question, though: why not send these messages on the control channel instead of the shell channel, and get the parallel execution that way for kernels that support it? A difference is that sending on the control channel ties the one really important thread up, instead of making a new thread per request/comm_id.

davidbrochart commented 2 years ago

I didn't work on putting control channel processing on a separate thread, maybe @JohanMabille did as part of the debugger? I remember @SylvainCorlay saying that the control channel should not be used for anything else than high-priority requests like "restart kernel", and certainly not for regular code execution. But we could leave that choice to the user, making explicit that if they choose to use the control channel for background code execution, it's at the expense of other important messages. I agree that there is no need to reinvent the wheel if this feature can be implemented basically for free.

jasongrout commented 2 years ago

I didn't work on putting control channel processing on a separate thread, maybe @JohanMabille did as part of the debugger?

Ah, it was Sylvain.

But we could leave that choice to the user, making explicit that if they choose to use the control channel for background code execution, it's at the expense of other important messages.

I agree - sending bigger and bigger messages to the control channel is abusing the purpose of the channel for a side effect of the channel. Autocomplete is maybe okay. Comm messages are pushing it. Certainly request_execute messages would be abusing the control channel. So in that sense, having this background message processing metadata flag lets us more naturally use the shell channel.

JohanMabille commented 2 years ago

I think we should distinguish what we want from how we implement it. A user may want to execute some tasks in parallel / in background. She also may want to have some dependencies between these tasks, for instance A and B can run in parallel, C and D can run in parallel, but C must run after A has completed and D after B has completed.

A solution can be to have an additional field in the execute_request message, let's call it background_id. Tasks with a same background_id should be executed sequentially by the same thread.

On the kernel side, we don't have to provide as many threads as background ids. We can have a thread pool, and the kernel dispatches the tasks among them, like a regular task scheduler. This does not ensure that tasks with different background ids will be executed in parallel, but it limits the number of created threads.

davidbrochart commented 2 years ago

I don't like this approach. It basically means that the user has no guarantee about what they are asking for. I prefer the "subshell" solution (see https://github.com/ipython/ipykernel/pull/955), where the user requests a subshell that executes in the background, and is notified if the kernel supports it or not. If it does, the user is certain that they got what they asked for: background execution. Subshells can be used to have another notebook or console share the same kernel concurrently, but they can also be used to run some cells in parallel within the same notebook (it's just a matter of having a cell targeting a specific subshell). Chaining cells is guaranteed inside a subshell, so this solution covers all use cases.

JohanMabille commented 2 years ago

As said in #806, these are two orthogonal problems. Subshell is meant for having an additional shell while the first one is busy executing a long running task. This approach should definitely not be used for parallel computing (because you don't want to create a new thread per background task), where you just want to submit the logic of the computing and let the kernel dispatch the tasks to the available resources (i.e. threads). As a client, you should not manage the resources of the kernel.

davidbrochart commented 2 years ago

I don't see them as orthogonal problems. Subshells can serve both purposes. It is fine to let the user choose if they want a new subshell (i.e. a new thread) or reuse a subshell, because that's what allows to solve the task dependencies. If you reuse a subshell, you know that your cell will be queued. If you create a new shell, you know that it will immediately start in the background. If you give all the resource management to the kernel, the user cannot specify if their cell will be queued or not, unless you introduce another information that says "run this cell after this one", but in the end you are introducing the notion of shell ID that I described in https://github.com/ipython/ipykernel/pull/955. And as I said, this approach cannot guarantee that a cell will run in the background if the user wishes to (in case all threads are busy), so it's very bad.

JohanMabille commented 2 years ago

because that's what allows to solve the task dependencies.

Specifying the task dependency is the responsibility of the client. "Solving" the task dependency and managing the resource is definitely not his responsibility (mainly because these resources can be shared among different clients).

If you give all the resource management to the kernel, the user cannot specify if their cell will be queued or not, unless you introduce another information that says "run this cell after this one"

Yes you can, exactly the way it is done now: cells with the same background ID will be executed one after the other in the order they are defined (exactly as it is the case now for a notebook), cells with different background ids may be executed in parallel depending on the available resources.

is approach cannot guarantee that a cell will run in the background if the user wishes to (in case all threads are busy), so it's very bad.

That prevents from crashing your process if a user creates too many threads, which is even worst.

davidbrochart commented 2 years ago

Nothing prevents the user from crashing the kernel, the user can run arbitrary code.

JohanMabille commented 2 years ago

Indeed, but if we can prevent him from crashing it because of bad resource management, we should definitely do it. Otherwise we will end up with resource management implemented in the client, he will be responsible for task submissions and retries on failure, will send additional messages over the network to ask how many tasks he can submit (which may not even stay the same number between the response to the first request and the next task submission message), etc etc.

davidbrochart commented 2 years ago

Threads are quite lightweight, you can easily start thousands of them, so we're far from reaching the limit with our use case. The user won't have to do any resource management except starting a subshell, and eventually stopping it (but even that is not mandatory). There won't be any task submission failure or things like that, it's going to be dead simple for the user.

JohanMabille commented 2 years ago

Until you have many kernels running on the same machine and clients executing notebooks with hundres or thousands of cells (which is already a use case). And in that case, there is resource limitation, and the client has to be able to resend tasks because another process may have "stolen" the available threads (or the notebook execution will crash randomly and the feature won't be usable at all). And even if we don't reach the limit, increasing the number of threads will increase latency and will degrade performance (which is definitely something we want to avoid for parallel computing).

it's going to be dead simple for the user.

Until we're in the case I described before, then it will be just "dead" for the client ;)

So there is technically no good reason for not delegating the resource management to the kernel; the client wants maximal performances and specifies what can be run in parallel (but in the end he does not have to care about what is run in parallel or not), then the scheduler / dispatcher is responsible for getting the best performances from the available resources. Again, this use case is not for sending messages while the kernel is busy, but sending a massive amount of tasks at once. That's morally the same as submitting computation to DASK and let it dispatch it on available resources.

davidbrochart commented 2 years ago

there is resource limitation, and the client has to be able to resend tasks because another process may have "stolen" the available threads

There won't be any stolen threads, the user being given a subshell will keep it for sure. And remember, the kernel can refuse a subshell request by sending an error in its subshell_reply, so it can manage resources.

the client wants maximal performances and specifies what can be run in parallel (but in the end he does not have to care about what is run in parallel or not)

Yes, the user cares about what should run in parallel, this is the whole point. If they ask for a background execution and don't get it, it's breaking their use case. Think of a widget that must run in the background to visualize some data being computed.

That's morally the same as submitting computation to DASK and let it dispatch it on available resources.

I disagree, this is out of the scope. If users want that kind of parallelism they should use libraries like Dask.

JohanMabille commented 2 years ago

There won't be any stolen threads, the user being given a subshell will keep it for sure.

So you never stop threads? If you execute a notebook with 2000 cells that can be run in parallel, you start 2000 threads and never stop them?

And remember, the kernel can refuse a subshell request by sending an error in its subshell_reply, so it can manage resources.

Yes so that's exactly a submission failure and you need to implement a retry strategy. And that's in contradiction with the previous statement: if a client keeps subshells for sure, and there is no more subshell available, then the second client can never submit its tasks. So you need to close the subshell at some point. And a client needs to know whether he can submit a task somehow (either by asking if a subshell is available, either by trying to submit the task again). And again, resource management is not the responsibility of the client but of the kernel (because these resources are shared between clients)

I disagree, this is out of the scope. If users want that kind of parallelism they should use libraries like Dask.

No, this is precisely what I have been discussing since the beginning: massive parallel execution. And you definitely cannot start a thread per task, this won't scale. Dask could indeed be used as an implementation detail on the kernel side. The idea is to be able to simply specify the task dependencies on the client side (one could even imagine a background_id: auto value meaning everything can be executed in parallel).

If it's just a matter of having some widgets executing in the background while being ble to still send messages to the kernel, then I agree subshells are meant for that and this issue can be closed as a duplicate of #806 .

jupyter / jupyter_client