Notebook cell execution

davidbrochart commented 1 year ago

Problem

Today at the Jupyter Server meeting we talked about notebook restoration, where RTC could be a solution (see https://github.com/jupyter-server/jupyter_server/issues/900). In that solution, the frontend just displays live changes to the notebook shared model. But currently, a notebook cell execution is either None or an integer greater than 0, so there is no way to show that a cell is executing.

Proposed Solution

We could encode the "cell executing" state in the execution count, for instance as "*". I think it could even be part of nbformat, because it brings some information about the state of the notebook at the time it was saved.

fcollonval commented 1 year ago

Taking the following scenario with the above proposal:

User A open a notebook
User A runs all cells among which one takes a long time
While the execution is processing the long running cell, user A saves the document
User A needs to shutdown the application (included server and kernel) for some external reasons before the long running cell finishes.
Two hours later user A reopens the notebook
The long running cell display a * that does not mean I'm running

So for me a state information should not be part of the document.

There are a couple of other scenarii I can think of that will make the * storage confusing; e.g. sharing a notebook file with a cell in that state with coworkers, open a read-only notebook with such state,... .

echarles commented 1 year ago

So for me a state information should not be part of the document.

After reading twice @fcollonval scenario, I tend to agree that the state information should not be part of the document.

This drives I think to a more general discussion about "what is a document?, what is a notebook?, what is a kernel?, what is a server? what is a state?" in RTC land.

davidbrochart commented 1 year ago

The long running cell display a * that does not mean I'm running

No, it means I was not finished when the notebook was saved, and it's aligned with the cell outputs. For instance, imagine a cell that prints intermediary results and a final result. The saved notebook won't show the final result, which is reflected in the execution state as not finished.

So for me a state information should not be part of the document.

It is already, implicitly through the outputs. IMO the execution state just makes it more explicit.

echarles commented 1 year ago

No, it means I was not finished when the notebook was saved,

This is a developer perspective. How can user eyes go from * to not finished when the notebook was saved?

It is already, implicitly through the outputs. IMO the execution state just makes it more explicit.

You are taking for granted that the current implementation (the outputs are part of the CRDT) is the right approach, which I am not convinced.

davidbrochart commented 1 year ago

How can user eyes go from * to not finished when the notebook was saved?

Let me reformulate to just not finished executing, which is the current situation.

You are taking for granted that the current implementation (the outputs are part of the CRDT) is the right approach, which I am not convinced.

No, I am just taking for granted that outputs are part of the notebook format, CRDT or not.

fcollonval commented 1 year ago

Let me reformulate to just not finished executing, which is the current situation.

No the current situation is more specific than that; we displayed a * are:

cell is currently executed in the kernel
cell is queued for execution

So there are always an assumption it is a transient information linked to some live interaction with the kernel.

You are raising a good point about the outputs that could be reflecting only a partial execution. So I lean towards proposing adding an information in the document about a cell being partly executed that should be unrelated to the * used in the UI. That could be stored in the execution_count or else where.

davidbrochart commented 1 year ago

cell is queued for execution

Although execution being queued is really kernel specific. For instance, akernel has the ability to run cells concurrently, so they might never be queued. If cells are queued, it's only because the kernel is executing blocking code. So for me the * really means not finished executing, but we don't know if the cell started executing. Let's say it is scheduled for execution. And I think this * is the exact information that we want to store in the document.

echarles commented 1 year ago

No, I am just taking for granted that outputs are part of the notebook format, CRDT or not.

I stil think that nbformat and the CRDT shared models are different things. Similar but different, so governed by different rules and schemas.

... If cells are queued, it's only because the kernel is executing blocking code. So for me the really means not finished executing, but we don't know if the cell started executing. Let's say it is scheduled for execution. And I think this is the exact information that we want to store in the document.

You clearly define two different states, but foresee only one indicator for the user. This could be better, like eg..

. = queued for execution
* = being executed.

However, the static representation of the notebook in a ipynb file with nbformat should not be discussed at the same level as the runtime status of that same notebook. Hence my point above that the rules and schemas should be different.

echarles commented 1 year ago

I would even be tempted to say that the output of a cell not being fully executed should not be part of that saved version (the ipynb file) of the notebook, as showing incomplete and potentially giving wrong conclusions to the reader. I don't expect everyone to agree with that, but I expect this to be a discussion point.

davidbrochart commented 1 year ago

You clearly define two different states, but foresee only one indicator for the user.

My point is that there are not two different states, only one which is scheduled for execution. The * visual indicator corresponds to this state.

davidbrochart commented 8 months ago

I opened #197 which adds an execution_state field to a cell, that encodes the execution state as an "idle" or "busy" string.

krassowski commented 2 months ago

As a counter proposal, https://github.com/jupyter-server/jupyter_ydoc/pull/227 adds pending_requests. We already discussed this at length with @davidbrochart but I am still on fence here. I wanted to highlight it here in case if @echarles or @fcollonval have thoughts on this.

echarles commented 2 months ago

Thx a lot @krassowski for joining the bits. I sometimes tend to step a it too much back, but I can see separated tracks like Server model / POST api / RTC / Async / Pending that are discussed and implemented, while not having a well defined complete picture of the interactions. So yes, I have toughts, and the thought is that we miss a copter view of all this... `

krassowski commented 2 months ago

To recap, we have three proposals:

a) store pending_requests in jupyter-server and expose a dedicated REST API to sync it with frontend https://github.com/jupyter-server/jupyter_server/issues/990
b) store pending_requests in the shared model and compute execution_state on frontend from it
c) store the execution_state in the shared model and use some future yet to be developed collaborative widgets solution to solve the issue of input requests

Separately, there is a question of recording execution timing data on the server side (for use cases of jupyterlab-execute-time and https://github.com/mwakaba2/jupyterlab-notifications extensions).

echarles commented 2 months ago

Thx, useful for the pending_request story. There are other aspects that are not tackled like:

the user input which does not work for now in case of server model
ipywidgets support in the server model (and rtc), while being backwards compatible with all the existing huge ipywidgets ecosystem.

These are just 2 examples, I am pretty sure there are other ones that will be discovered as side effects if we don't carefully identify them.

krassowski commented 1 month ago

You clearly define two different states, but foresee only one indicator for the user.

My point is that there are not two different states, only one which is scheduled for execution. The * visual indicator corresponds to this state.

I think there should be two different states even in an async kernel, because there is a use case for executing async cells with dependencies and cancelling a cell which was scheduled to run while its dependencies have not finished.

In fact the execution queue visualisation and manipulation is one of the most frequently requested features in JupyterLab issue tracker: https://github.com/jupyterlab/jupyterlab/issues/7825. I believe we should design the API with this in mind.

krassowski commented 1 month ago

In the future maybe should be a fourth state to indicates that the kernel died when the cells were executing (or waiting in execution queue), say [E].

krassowski commented 1 month ago

Also, pending user input is a special case of running.

krassowski commented 1 month ago

Coming here from a difference place today: I am trying to implement an indicator that a cell is running in the minimap for notebook; the minimap has access to cell models but not widgets.

The problem is the [*] prompt is not even a part of the ICodeCellModel in JupyterLab. Instead the CodeCell widget has a setPrompt() method which accept a string. This is different from execution_count which is a number.

To move forward with implementation I want to deprecate setPrompt in favour of semantic information about the cell state in the cell model.

Thinking out loud, both execution_state and pending_requests can encode information about (a) whether the cell is currently running (b) whether the cell is scheduled for execution - as long as execution_state is a string and not a boolean (which is already the case in https://github.com/jupyter-server/jupyter_ydoc/pull/197).

Further, I would think that pending_requests might be a private implementation detail of the CodeCellModel with execution_state being the only thing that is exposed.

I think the logic for prompt would be, in pseudo code:

if execution_state == 'idle':
  if execution_number:
    prompt = '[' + execution_number + ']'
  else:
    prompt = '[ ]'
elif execution_state == 'queued':
  prompt = '[.]'
elif execution_state == 'running':
  prompt = '[*]'

This is also loosely related to https://github.com/datalayer/jupyter-server-nbmodel/issues/13

jupyter-server / jupyter_ydoc

Notebook cell execution #169

Problem

Proposed Solution