jupyter-server / jupyter_ydoc

Jupyter document structures for collaborative editing using Yjs/pycrdt
https://jupyter-ydoc.readthedocs.io
BSD 3-Clause "New" or "Revised" License
26 stars 16 forks source link

Notebook cell execution #169

Open davidbrochart opened 1 year ago

davidbrochart commented 1 year ago

Problem

Today at the Jupyter Server meeting we talked about notebook restoration, where RTC could be a solution (see https://github.com/jupyter-server/jupyter_server/issues/900). In that solution, the frontend just displays live changes to the notebook shared model. But currently, a notebook cell execution is either None or an integer greater than 0, so there is no way to show that a cell is executing.

Proposed Solution

We could encode the "cell executing" state in the execution count, for instance as "*". I think it could even be part of nbformat, because it brings some information about the state of the notebook at the time it was saved.

fcollonval commented 1 year ago

Taking the following scenario with the above proposal:

So for me a state information should not be part of the document.

There are a couple of other scenarii I can think of that will make the * storage confusing; e.g. sharing a notebook file with a cell in that state with coworkers, open a read-only notebook with such state,... .

echarles commented 1 year ago

So for me a state information should not be part of the document.

After reading twice @fcollonval scenario, I tend to agree that the state information should not be part of the document.

This drives I think to a more general discussion about "what is a document?, what is a notebook?, what is a kernel?, what is a server? what is a state?" in RTC land.

davidbrochart commented 1 year ago
  • The long running cell display a * that does not mean I'm running

No, it means I was not finished when the notebook was saved, and it's aligned with the cell outputs. For instance, imagine a cell that prints intermediary results and a final result. The saved notebook won't show the final result, which is reflected in the execution state as not finished.

So for me a state information should not be part of the document.

It is already, implicitly through the outputs. IMO the execution state just makes it more explicit.

echarles commented 1 year ago

No, it means I was not finished when the notebook was saved,

This is a developer perspective. How can user eyes go from * to not finished when the notebook was saved?

It is already, implicitly through the outputs. IMO the execution state just makes it more explicit.

You are taking for granted that the current implementation (the outputs are part of the CRDT) is the right approach, which I am not convinced.

davidbrochart commented 1 year ago

How can user eyes go from * to not finished when the notebook was saved?

Let me reformulate to just not finished executing, which is the current situation.

You are taking for granted that the current implementation (the outputs are part of the CRDT) is the right approach, which I am not convinced.

No, I am just taking for granted that outputs are part of the notebook format, CRDT or not.

fcollonval commented 1 year ago

Let me reformulate to just not finished executing, which is the current situation.

No the current situation is more specific than that; we displayed a * are:

So there are always an assumption it is a transient information linked to some live interaction with the kernel.


You are raising a good point about the outputs that could be reflecting only a partial execution. So I lean towards proposing adding an information in the document about a cell being partly executed that should be unrelated to the * used in the UI. That could be stored in the execution_count or else where.

davidbrochart commented 1 year ago
  • cell is queued for execution

Although execution being queued is really kernel specific. For instance, akernel has the ability to run cells concurrently, so they might never be queued. If cells are queued, it's only because the kernel is executing blocking code. So for me the * really means not finished executing, but we don't know if the cell started executing. Let's say it is scheduled for execution. And I think this * is the exact information that we want to store in the document.

echarles commented 1 year ago

No, I am just taking for granted that outputs are part of the notebook format, CRDT or not.

I stil think that nbformat and the CRDT shared models are different things. Similar but different, so governed by different rules and schemas.

... If cells are queued, it's only because the kernel is executing blocking code. So for me the really means not finished executing, but we don't know if the cell started executing. Let's say it is scheduled for execution. And I think this is the exact information that we want to store in the document.

You clearly define two different states, but foresee only one indicator for the user. This could be better, like eg..

However, the static representation of the notebook in a ipynb file with nbformat should not be discussed at the same level as the runtime status of that same notebook. Hence my point above that the rules and schemas should be different.

echarles commented 1 year ago

I would even be tempted to say that the output of a cell not being fully executed should not be part of that saved version (the ipynb file) of the notebook, as showing incomplete and potentially giving wrong conclusions to the reader. I don't expect everyone to agree with that, but I expect this to be a discussion point.

davidbrochart commented 1 year ago

You clearly define two different states, but foresee only one indicator for the user.

My point is that there are not two different states, only one which is scheduled for execution. The * visual indicator corresponds to this state.

davidbrochart commented 8 months ago

I opened #197 which adds an execution_state field to a cell, that encodes the execution state as an "idle" or "busy" string.

krassowski commented 2 months ago

As a counter proposal, https://github.com/jupyter-server/jupyter_ydoc/pull/227 adds pending_requests. We already discussed this at length with @davidbrochart but I am still on fence here. I wanted to highlight it here in case if @echarles or @fcollonval have thoughts on this.

echarles commented 2 months ago

Thx a lot @krassowski for joining the bits. I sometimes tend to step a it too much back, but I can see separated tracks like Server model / POST api / RTC / Async / Pending that are discussed and implemented, while not having a well defined complete picture of the interactions. So yes, I have toughts, and the thought is that we miss a copter view of all this... `

krassowski commented 2 months ago

To recap, we have three proposals:

Separately, there is a question of recording execution timing data on the server side (for use cases of jupyterlab-execute-time and https://github.com/mwakaba2/jupyterlab-notifications extensions).

echarles commented 2 months ago

Thx, useful for the pending_request story. There are other aspects that are not tackled like:

These are just 2 examples, I am pretty sure there are other ones that will be discovered as side effects if we don't carefully identify them.

krassowski commented 1 month ago

You clearly define two different states, but foresee only one indicator for the user.

My point is that there are not two different states, only one which is scheduled for execution. The * visual indicator corresponds to this state.

I think there should be two different states even in an async kernel, because there is a use case for executing async cells with dependencies and cancelling a cell which was scheduled to run while its dependencies have not finished.

In fact the execution queue visualisation and manipulation is one of the most frequently requested features in JupyterLab issue tracker: https://github.com/jupyterlab/jupyterlab/issues/7825. I believe we should design the API with this in mind.

krassowski commented 1 month ago

In the future maybe should be a fourth state to indicates that the kernel died when the cells were executing (or waiting in execution queue), say [E].

krassowski commented 1 month ago

Also, pending user input is a special case of running.

krassowski commented 1 month ago

Coming here from a difference place today: I am trying to implement an indicator that a cell is running in the minimap for notebook; the minimap has access to cell models but not widgets.

The problem is the [*] prompt is not even a part of the ICodeCellModel in JupyterLab. Instead the CodeCell widget has a setPrompt() method which accept a string. This is different from execution_count which is a number.

To move forward with implementation I want to deprecate setPrompt in favour of semantic information about the cell state in the cell model.

Thinking out loud, both execution_state and pending_requests can encode information about (a) whether the cell is currently running (b) whether the cell is scheduled for execution - as long as execution_state is a string and not a boolean (which is already the case in https://github.com/jupyter-server/jupyter_ydoc/pull/197).

Further, I would think that pending_requests might be a private implementation detail of the CodeCellModel with execution_state being the only thing that is exposed.

I think the logic for prompt would be, in pseudo code:

if execution_state == 'idle':
  if execution_number:
    prompt = '[' + execution_number + ']'
  else:
    prompt = '[ ]'
elif execution_state == 'queued':
  prompt = '[.]'
elif execution_state == 'running':
  prompt = '[*]'

This is also loosely related to https://github.com/datalayer/jupyter-server-nbmodel/issues/13