Document use of `$$.display()`

n-riesco / ijavascript

IJavascript is a javascript kernel for the Jupyter notebook

Other

2.18k stars 187 forks source link

Document use of `$$.display()` #95

Open n-riesco opened 7 years ago

n-riesco commented 7 years ago

Moved from https://github.com/n-riesco/nel/issues/4

gnestor commented 7 years ago

To continue the discussion from n-riesco/nel#4.

@n-riesco I'm happy to work on this, but I have a couple questions:

Where are display_data messages created and sent? I can't find it in jp-kernel, nel, or jmp. This is where we will want to integrate update_display messages.
How to update displays? I like your idea of adding an update method to IPython.display and abstracting display_id away from the user. We could implement something similar (e.g a display object that a user can import and use similar to ipython's). To update a display, a user could just call the update method on the display object with the new value.

n-riesco commented 7 years ago

@gnestor First a question to you. Do you need this functionality for IJavascript only? Or for jp-babel as well?

jp-babel and jp-coffeescript aren't using the latest version of jp-kernel (This is now top of my TODO list).

Now, I'll try to answer your questions:

Where are display_data messages created and sent? I can't find it in jp-kernel, nel, or jmp. This is where we will want to integrate update_display messages.

I haven't implemented them. Note that this isn't going to be a small PR (display_data messages aren't replies to a request!). Let me explain what it involves:

display_data messages are triggered by the user (they aren't a reply to a request from the frontend), and hence, we need to provide the user an API to trigger this messages (e.g. $$.sendDisplayData() and $$.updateDisplayData()). We would implement this part in nel, specifically in this file.
Since a display_data message isn't a reply message, we need to provide a mechanism for nel to tell jp-kernel when the user wants to send a display_data message. I guess the easiest solution is to register a listener in nel's config (e.g. Config.listener).
And last, but not least, we would need to implement the listener in jp-kernel. The listener would be responsible for sending the actual display_data message on the iopub socket.

How to update displays? I like your idea of adding an update method to IPython.display and abstracting display_id away from the user. We could implement something similar (e.g a display object that a user can import and use similar to ipython's). To update a display, a user could just call the update method on the display object with the new value.

I think I've answered this already. What do you think of using $$.sendDisplayData() and $$.updateDisplayData()?

gnestor commented 7 years ago

Thanks for the quick response!

I think this should be implemented for all 3 kernels (I'm just using "ijs" as an abbreviation for the whole family).

I think $$.sendDisplayData() and $$.updateDisplayData() work. However, sending display data is already supported in the sense that a user can do $$mime$$ = { mimetype: data } or $$.mime({ mimetype: data }) or simply return { _toMime: () => { mimetype: data } }, correct? If so, then updating a display could simply be $$.update({ mimetype: data }) except that it would need to follow your example of const $1 = global.$$; $1.mime({ mimetype: data }) in one cell and $1.update({ mimetype: data }) in another cell. At this point, I would suggest that $$ just be renamed to display, except that it has a couple methods that are not display related (async, done, sendResult, and sendError). Another thought (a bit off-topic), but instead of providing $$ as a global in the context, perhaps the path to the global node_modules could be added to NODE_PATH (e.g. NODE_PATH=/usr/lib/node_modules) via an environment variable, allowing users to require/import a display object from ijavascript, much like they do with ipython:

import { display, updateDisplay } from 'ijavascript';
// In [1]
display.JSON({ key: 'value' }, { displayId: '1' })

// In [2]
update_display({ key: 'new value' }, { displayId: '1' })

...one nice thing about working with Python in notebooks is that you don't need to install pandas in every directory that contains notebooks that depend on pandas.

If I'm not mistaken up to this point, then it seems like:

jmp needs to support the transient property for execution_result messages responding to a display_data or update_display_data messages
nel needs to provide an update_display method that will send a message of update_display_data type
ijavascript could export a display object that is as consistent as possible with IPython.display

@rgbkrk Feel free to chime in since I'm very much naive when it comes to kernel implementation.

n-riesco commented 7 years ago

It's a bit late here, so I'm only have time to reply to some of your questions:

I think this should be implemented for all 3 kernels (I'm just using "ijs" as an abbreviation for the whole family).

I'll try to come up with something soon and bump the jp-kernel version in jp-babel and jp-coffeescript (unfortunately this change isn't just a 1-liner).

However, sending display data is already supported in the sense that a user can do $$mime$$ = { mimetype: data } or $$.mime({ mimetype: data }) or simply return { _toMime: () => { mimetype: data } }, correct?

Methods like $$.mime() send execute_reply messages, whereas what we're planning here is that $$.sendDisplayData() and $$.updateDisplayData() send display_update messages.

jmp needs to support the transient property for execution_result messages responding to a display_data or update_display_data messages

This is one of the reasons why I think this change to display_data in v5.1 of the Jupyter messaging protocol doesn't really fit into what the Jupyter Messaging protocol is/was. I wish I've had a chance to comment before it was introduced.

nel needs to provide an update_display method that will send a message of update_display_data type

No, it doesn't and it shouldn't. Let me explain why:

nel is the module for managing Node.js sessions.
nel is protocol agnostic (in fact, it doesn't do any networking) (it could be hooked to a ZMQ server, to a REST server, ...).
jp-kernel is the ZMQ server that speaks the Jupyter messaging protocol (here's where code that sends update_display and update_display_data should live).

ijavascript could export a display object that is as consistent as possible with IPython.display

Let's get first an initial implementation for update_display and update_display_data within the current framework. Otherwise, we are going to get bogged down by the details of a new API for IJavascript.

rgbkrk commented 7 years ago

Now some of my confusion with ijavascript is relieved. Here's my expectation as a user which I'll illustrate with both Python and JavaScript

def something(a):
    print(a) // stream stdout
    display(a) // display_data

    return a

something(3) // execute_result

function something(a) {
  display(a) // display data
  console.log(a) // stream stdout

  return a
}

something(3) // execute_result

Execute result is supposed to be the last evaluation. A function that overrides the execute_result is contrary to convention in an interpreter.

n-riesco commented 7 years ago

@rgbkrk I'd say contrary to the convention in an interpreter of a synchronous language like Python.

This is what @minrk said here when we discussed asynchronous outputs:

Some relevant details of the spec:

output is associated with a given cell by parent_header.msg_id

the status: idle message is used to indicate that the output from a given cell is 'complete', and handlers can be cleaned up.

the handling of output associated with a given parent produced after status: idle is technically undefined

...however, given that async output can and does get produced after status: idle, the current behavior is to associate it with the most recent cell

There is no restriction about multiple outstanding requests. It is AOK for two cells to be running concurrently, and if they keep their parent's in order, everything should be fine.

So general recommendations for kernels:

do your best to ensure that output is complete before sending the idle message, but it's not always feasible and that's okay

long-running async output (e.g. setInterval(console.log...) should be considered detached from its cell if the cell's idle message isn't being held

I'm not opposed to change IJavascript's behaviour, but let's bear in mind that one of the main reasons IJavascript is a unique kernel is that it provides the user with means to produce output asynchronously.

If I was given the chance, I'd like to discuss how to extend the execute_result message, rather than using the extension of display_data with transient.

rgbkrk commented 7 years ago

The convention, even in node.js's interpreter is that asynchronous outputs can come in whenever and they're not the result of an execution:

> (function() { setTimeout(() => console.log('hey'), 1000); return 3 })()
3 // execute_result, "first cell"
> 5 // Totally separate execution
5
hey // Comes in separately

n-riesco commented 7 years ago

@rgbkrk Oh, come on! You know what I meant. Node.js's REPL provides no features to handle the asynchronous nature of Javascript!

Anyway, as I said I'm not opposed to change IJavascript's behaviour (provided no functionality already present and that makes IJavascript so unique a kernel is lost).

@minrk Is the transient extension of display_data set in stone. No chance to discuss it?

rgbkrk commented 7 years ago

Right, that's why I'm saying that display_data should be used for asynchronous rich outputs that aren't console outputs.

rgbkrk commented 7 years ago

By the way, thank you for discussing all of this with us @n-riesco even when our opinions on this differ.

n-riesco commented 7 years ago

On 02/04/17 18:55, Kyle Kelley wrote:

Right, that's why I'm saying that |display_data| should be used for asynchronous rich outputs that aren't |console| outputs.

That really differs from my reading of:

the Jupyter messaging spec in http://jupyter-client.readthedocs.io/en/latest/messaging.html#display-data
and @minrk 's clarification in https://github.com/n-riesco/ijavascript/issues/58#issuecomment-182043186

In particular:

The specs don't specify the nature of the data in display_data (in fact, now that I'm re-reading that section, there is a comment that suggests it shouldn't be used for "execute_result/displayhook")
@minrk 's clarification doesn't mention display_data (is status: "idle" to be sent before/after any display_data?)

OK, let's move on from my misunderstanding about IPython's use of display_data.

Assuming now my understanding of display_data is correct, a synchronous execution request involves the following exchange of messages between kernel and frontend:

-> execution_request <- status: busy <- execute_input <- execute_result <- display_data (if the user invoked any method in IPython.display) <- execute_reply <- status: idle

The frontend waits until status: idle and:

if it only got an execute_result, then it shows execute_result in Out[n]:
if it got both an execute_result and display_data, then it shows display_data in Out[n]:

I hope I got all this dance right! :)

Now, let's move onto the transient and the asynchronous case (I'm doing guess work, because what I'm going to describe isn't included in the Jupyter messaging spec):

the only purpose of the transient field is to assign a label to an Out[n]: cell. This means:
- before v5.1, frontends could only identify an Out[n]: cell by the msg_id of the execution_request
- after v5.1, Out[n]: cells can be identified by msg_ids or transient labels (this is bad: what happens if we rerun an asynchronous cell?)
update_display_data is sent after status: idle. This means:
- (please, see https://github.com/n-riesco/ijavascript/issues/62 for context about an issue with nbconvert)
- before v5.1, nbconvert was able to render an output cell after after receiving status: idle
- after v5.1, nbconvert will be unable to do so, because update_display_data can arrive after status: idle, and the kernel has no way to tell nbconvert when the execution of asynchronous code is finished.

Is the above correct? If so, how are we going to address these problems with the execution of asynchronous cells?

n-riesco commented 7 years ago

jmp needs to support the transient property for execution_result messages responding to a display_data or update_display_data messages

@gnestor After re-reading the spec for display_data, I noticed that the transient field is inside content. This means that we don't need to change jmp (I implied we had to do so in this comment).

minrk commented 7 years ago

the only purpose of the transient field is to assign a label to an Out[n]: cell

No. transient is a field for any information about a display that should not be persisted. Specifically, that applies to display_data, not just (and not generally) execute_result.

before v5.1, frontends could only identify an Out[n]: cell by the msg_id of the execution_request

Before 5.1, individual outputs could not be addressed at all, because one cell can have many outputs.

after v5.1, Out[n]: cells can be identified by msg_ids or transient labels

After 5.1, individual outputs can be addressed by transient.display_id. msg_id still cannot identify individual outputs, only a collection of outputs.

(this is bad: what happens if we rerun an asynchronous cell?)

There is no change from 5.0 for async outputs. If an async output sends a new display message, it is handled exactly as before. If it updates an existing display that is still present, then the existing display will be updated.

before v5.1, nbconvert was able to render an output cell after after receiving status: idle

after v5.1, nbconvert will be unable to do so, because update_display_data can arrive after status: idle, and the kernel has no way to tell nbconvert when the execution of asynchronous code is finished.

This is unchanged from 5.0. Whether an output targets an existing area or should create a new one doesn't have much bearing on whether nbconvert can tell if async code has finished. nbconvert hasn't merged update-display support yet, though.

how are we going to address these problems with the execution of asynchronous cells?

The situation should be largely unchanged for async execution. No changes are made to handling of existing messages.

n-riesco commented 7 years ago

@minrk

No. transient is a field for any information about a display that should not be persisted. Specifically, that applies to display_data, not just (and not generally) execute_result.

I understand the distinction between transient and display_id. I should've written written display_id (but since display_id is the only use of transient I'm aware of, I thought it'd be clear enough).

Before 5.1, individual outputs could not be addressed at all, because one cell can have many outputs.

Yet, IJavascript did and still does it by ensuring that reply messages set their parent headers to the request header associated with the cell.

The notebook shows multiple execute_reply messages below In[n]:, if those messages have their parent header set to the header of the execute_request associated with In[n]:.

There is no change from 5.0 for async outputs. If an async output sends a new display message, it is handled exactly as before. If it updates an existing display that is still present, then the existing display will be updated.

I think the statement above is incorrect, because update_display_data messages were introduced in v5.1. When I wrote "this is bad: what happens if we rerun an asynchronous cell?", I meant that update_display_data messages break the handling of asynchronous output. I gave two examples of how this happens:

it breaks nbconvert because nbconvert won't be able to determine when asynchronous cells have completed their work and won't send any more update_display_data messages (If I'm mistaken, please, explain how nbconvert can determine that asynchronous cells have finished).
addressing a cell by a label won't allow re-running an asynchronous cell. To make this claim clear, let me put an example in IJavascript:

Let's say I have this cell In[1]::

$$.async();

var label = "test";
var $test = $$;

$test.sendDisplayData(label, { "text/plain": "tic" });

setInterval(function() {
    $test.updateDisplayData(label, { "text/plain": "tic" });
}, 1000);

After this cell is run, I would expect to get:

first Out[1]: tic
and every second after an additional tic, i.e.: Out[1]: tic\ntic

Now, let's edit the cell and replace tic by tac:

$$.async();

var label = "test";
var $test = $$;

$test.sendDisplayData(label, { "text/plain": "tac" });

setInterval(function() {
    $test.updateDisplayData(label, { "text/plain": "tac" });
}, 1000);

After this cell is re-run, I would expect that:

the frontend replaces In[1]: with In[2]:
and that Out[2]: first shows Out[2]: tac
and every second after an additional tac, i.e.: Out[2]: tac\ntac

But this isn't what it will happen, because the label "test" that previously referred to Out[1]:, not it'll refer to Out[2]:, and we will have 2 setInterval's sending tics and tacs.

I hope this example shows better what the issue with display_id is.

minrk commented 7 years ago

Before 5.1, individual outputs could not be addressed at all, because one cell can have many outputs.

Yet, IJavascript did and still does it by ensuring that reply messages set their parent headers to the request header associated with the cell.

This is not accurate. Setting parent ID adds new outputs, it does not update existing outputs. I think there's some confusion about what a single output is. A cell has a list of outputs (also referred to as its "output area"). Each display_data (or execute_result, etc.) message with a given parent produces a new output in an output list. A cell's output area is associated with a single parent msg_id. New output messages with a given parent msg_id are added to this list.

The display_id allows updating a single output in this list, which was not possible before. It does not change how to route new outputs to a given output area, which can still only be accomplished by setting the correct parent msg_id.

I think the statement above is incorrect, because update_display_data messages were introduced in v5.1. When I wrote "this is bad: what happens if we rerun an asynchronous cell?", I meant that update_display_data messages break the handling of asynchronous output.

Since there have been no changes to how existing messages are handled, I don't see how it can break the handling of anything that works currently. It is purely new functionality, which may or may not make sense to use in different circumstances.

it breaks nbconvert because nbconvert won't be able to determine when asynchronous cells have completed their work and won't send any more update_display_data messages (If I'm mistaken, please, explain how nbconvert can determine that asynchronous cells have finished).

The status: idle detection continues to work exactly as well/poorly to detect when a cell has completed execution. The difference with update_display_data is that one cell's execution (async or not) can update another cell's output. It specifically allows updating outputs after a cell's execution is complete. It does not change anything about whether or when a cell's execution is considered complete.

addressing a cell by a label won't allow re-running an asynchronous cell.

Cells are not addressed by display_id. Cells continue to be addressed solely by parent msg_id. Only individual outputs within one or more cells' output lists are addressed by display_id.

After this cell is run, I would expect to get:

first Out[1]: tic

and every second after an additional tic, i.e.: Out[1]: tic\ntic

This won't happen. The output will continually be updated with the same tic text. update-display replaces outputs, it does not append new outputs. So each second, all outputs with the id text will be replaced with the text tic. These outputs can span many cells.

After this cell is re-run, I would expect that:

the frontend replaces In[1]: with In[2]:

and that Out[2]: first shows Out[2]: tac

and every second after an additional tac, i.e.: Out[2]: tac\ntac

But this isn't what it will happen, because the label "test" that previously referred to Out[1]:, not it'll refer to Out[2]:, and we will have 2 setInterval's sending tics and tacs.

You will indeed see tic/tac alternate (not appending, but overwriting), because you have two bits of code explicitly declaring that they should be updating the same display, and every instance of that display ID on the page. Another illustration would be if you had two separate cells with the same content, updating the same id. There again, you would see both outputs in both cells alternating between tic/tac.

I hope this example shows better what the issue with display_id is.

It does, in that it seems to show a case where update_display would not be used, because the goal is to append new outputs, rather than update existing ones. If your goal is to continue to append new outputs, sending display_data messages with the right parent continues to be the right thing to do.

n-riesco commented 7 years ago

This is not accurate. [...]

I agree I wasn't accurate enough; but whether an output is appended or replaced is a red herring in this discussion, what matters is the ability to associate an output with an input (I will explain further after I address the other points you've made).

Since there have been no changes to how existing messages are handled, I don't see how it can break the handling of anything that works currently. It is purely new functionality, which may or may not make sense to use in different circumstances.

OK, let me rephrase my claim (I wasn't suggesting that v5.1 should've been v6.0). What I'm saying is that nbconvert and similar tools won't work as expected with kernels that send update_display_data messages (the reason being, as explained before, that update_display_data messages are sent after the status: idle message).

Cells are not addressed by display_id. Cells continue to be addressed solely by parent msg_id. Only individual outputs within one or more cells' output lists are addressed by display_id.

Again, this is a red herring in this discussion, what matters is the ability to associate an output with an input (I will explain further after I address the other points you've made).

This won't happen. The output will continually be updated with the same tic text. update-display replaces outputs, it does not append new outputs. So each second, all outputs with the id text will be replaced with the text tic. These outputs can span many cells.

Ditto. Appended or replaced isn't the main issue (I will explain further after I address the other points you've made).

You will indeed see tic/tac alternate (not appending, but overwriting), because you have two bits of code explicitly declaring that they should be updating the same display, and every instance of that display ID on the page. Another illustration would be if you had two separate cells with the same content, updating the same id. There again, you would see both outputs in both cells alternating between tic/tac.

Yes, that's correct. I'm happy to see that this time I managed to explain what the issue is in a way that can be understood.

It does, in that it seems to show a case where update_display would not be used, because the goal is to append new outputs, rather than update existing ones.

Again, I think appending a new output or replace it is a red herring here.

To explain why, I'm going to give a counter-example and I'm going to propose a solution that doesn't introduce the use of display_ids.

First, the counter-example that shows it is already possible to append or replace an output in v5.0 of the messaging protocol:

(as explained in your first point), frontends append execute_reply messages to the output list
(I've just checked this in the notebook), frontends replace the output with the content of the last display_data message (i.e. if multiple calls to IPython.display.HTML are made within a cell, only the last one is shown in the output).

The solution that I have in mind to address the needs of asynchronous code is as follows:

send the status: idle message only when the kernel can confirm no more outputs will be sent.
use msg_id's instead of display_id's to determine the input associated with a given output. (note the removal of display_id's from the messaging protocol doesn't imply that the API in IPython.display has to change. It only implies that frontends don't need to be told. They already can determine the originating input by looking at the msg_id).
one could even get rid of update_display_data and replace it with a new property in display_data (I'd suggest to call it content.append so that this change would be backwards-compatible with the behaviour of display_data in v5.0).

rgbkrk commented 7 years ago

If you make multiple calls to IPython.display.HTML without displaying them, they're not replacing. Only the last result is shown. You have to make display calls to show these:

Resulting cell:

{
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p>This won't</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>be replaced</b>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(HTML(\"<p>This won't</p>\"))\n",
    "display(HTML(\"<b>be replaced</b>\"))"
   ]
  }

n-riesco commented 7 years ago

@rgbkrk The behaviour I described is from the IPython notebook in Ubuntu 16.04:

$ ipython notebook --version
2.4.1

screenshot from 2017-04-03 17-36-52

But again, as I said to @minrk , replace vs append in this discussion is a red herring (it can be easily implemented by having a field content.append in display_data messages).

What matters to kernels able to send asynchronous output is:

to have a mechanism to tell the frontend when an asynchronous output is complete (so that tools like nbconvert know when they can render the output).
to be able to re-run an asynchronuos cell (if a user is creating a notebook that runs asynchronous code and makes a mistake, they would have to restart the kernel and re-run all the necessary cells again; this isn't a convenient workflow)

n-riesco commented 7 years ago

@rgbkrk I've just noticed that in your example, you use display(HTML(...)) and I used HTML(...).

This is the json for the cell I posted:

    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.display import HTML\n",
      "\n",
      "HTML(\"<div style='background-color:olive;width:50px;height:50px'></div>\")\n",
      "\n",
      "\n",
      "HTML(\"<div style='background-color:red;width:50px;height:50px'></div>\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style='background-color:red;width:50px;height:50px'></div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 1,
       "text": [
        "<IPython.core.display.HTML at 0x7f7c0c65b210>"
       ]
      }
     ],
     "prompt_number": 1
    }

rgbkrk commented 7 years ago

I don't have time to support old versions of IPython so I think I'll just step away from this discussion since it's not nbformat v4+ or message spec v5+.

n-riesco commented 7 years ago

@rgbkrk Oh, come on! As I said, the append vs replace is an irrelevant question (so is the question nbformat v4 vs v5).

I only posted the screenshot and the json snippet in case you felt curious where my results where coming from.

Again, let me repeat what I think matters the most to asynchronous output in Jupyter:

to have a mechanism to tell the frontend when an asynchronous output is complete (so that tools like nbconvert know when they can render the output).
to be able to re-run an asynchronuos cell (if a user is creating a notebook that runs asynchronous code and makes a mistake, they would have to restart the kernel and re-run all the necessary cells again; this isn't a convenient workflow)

rgbkrk commented 7 years ago

You're showing IPython 2.x which is on nbformat v3.

n-riesco commented 7 years ago

@rgbkrk What do you think of the two points I make about nbconvert and re-running an asynchronous cell?

rgbkrk commented 7 years ago

to have a mechanism to tell the frontend when an asynchronous output is complete (so that tools like nbconvert know when they can render the output).

You can never know when "asynchronous output" is complete. You can only know when the evaluation of a cell is finished. If anything is running in the background (timeouts, waiting for callbacks, unintentionally registered event listeners), they can and should keep on running.

The same case is around for Python with threadpools, processpools, or any of the eventloops you can run in Python.

to be able to re-run an asynchronuos cell (if a user is creating a notebook that runs asynchronous code and makes a mistake, they would have to restart the kernel and re-run all the necessary cells again; this isn't a convenient workflow)

While inconvenient, it sounds like it matches the paradigm with JavaScript. It's up to the user to make idempotent code if that's what they want to do.

n-riesco commented 7 years ago

@rgbkrk

You can never know when "asynchronous output" is complete. You can only know when the evaluation of a cell is finished. If anything is running in the background (timeouts, waiting for callbacks, unintentionally registered event listeners), they can and should keep on running.

IJavascript solves this problem by providing the call $$.done()

While inconvenient, it sounds like it matches the paradigm with JavaScript. It's up to the user to make idempotent code if that's what they want to do.

But it is unnecessary, if msg_id's were used instead of display_id's there wouldn't be any issues.

minrk commented 7 years ago

What I'm saying is that nbconvert and similar tools won't work as expected with kernels that send update_display_data messages (the reason being, as explained before, that update_display_data messages are sent after the status: idle message).

This is not true. update_display_data messages are typically sent before status:idle (in sync code, at least), just like all other display outputs. update_display may come after status:idle in async coe, just like all other outputs. The only difference is that the destination can be a cell other than the one producing the output.

(I've just checked this in the notebook), frontends replace the output with the content of the last display_data message (i.e. if multiple calls to IPython.display.HTML are made within a cell, only the last one is shown in the output).

I think you've misinterpreted what's happening in the kernel. Instantiating an HTML object produces no output. Only calling display produces a display_data output. The final result of the cell produces a single execute_result output. Your example only produces one output, so no outputs are replaced. This is true all the way back to IPython 0.12. There continues to be no mechanism to replace a single output.

The solution that I have in mind to address the needs of asynchronous code is as follows:

send the status: idle message only when the kernel can confirm no more outputs will be sent.

Makes sense. There are no changes relevant to this in 5.1.

use msg_id's instead of display_id's to determine the input associated with a given output. (note the removal of display_id's from the messaging protocol doesn't imply that the API in IPython.display has to change. It only implies that frontends don't need to be told. They already can determine the originating input by looking at the msg_id).

display_id isn't about associating an output with an input because, as you have pointed out, msg_ids outputs with new information, which can come from other cell executions.

replace vs append in this discussion is a red herring (it can be easily implemented by having a field content.append in display_data messages).

This is also untrue. A content.append flag in display_data does not allow specific outputs to be addressed. To do this, an id must be applied to individual outputs. A boolean append flag would only allow updating the most recent display, which is not the goal.

minrk commented 7 years ago

Crap, that wasn't supposed to submit, yet. Continuing...

Here's the kind of thing display_id does:

replace vs append in this discussion is a red herring

This I think is the key. Replace vs append isn't a red herring, it's the entire point. display_id solves the problem of addressing specific individual outputs (not whole output areas, which msg_ids address). We have these requirements for updating outputs:

at an arbitrary time
from an arbitrary cell
without affecting the outputs before / after the target output of the same cell

The append flag only allows updating the most recent output during the current cell's execution (as identified by msg_id), which isn't the goal of display_id.

to have a mechanism to tell the frontend when an asynchronous output is complete (so that tools like nbconvert know when they can render the output).

status:idle continues to be this. display_id / update_display_data don't change anything about it. If a kernel does a rigorous job tracking async executions with status:idle, it knows not only that no new outputs will come, but also that no output updates will come.

to be able to re-run an asynchronuos cell (if a user is creating a notebook that runs asynchronous code and makes a mistake, they would have to restart the kernel and re-run all the necessary cells again; this isn't a convenient workflow)

This will depend on the nature of the mistake, but it's true that it facilitates a class of bug that users may need to restart the kernel to recover from. There are plenty of these already, though, and it doesn't make it any harder to write async code that works correctly on re-execution. If the user produces code that runs forever, updating a specific named output, that named output will continue to be updated.

n-riesco commented 7 years ago

@minrk

Before I address your points, I think I need to clarify what I mean by using msg_id's instead of display_id's (Note that the solution I'm proposing is how IJavascript is currently working).

Please, correct me again if I'm wrong, but what you're saying is that msg_id's can't be used to update a cell output from another cell. Here's an example of how IJavascript solves this issue using only msg_id's.

Let's say we have cell 1 with the following code:

// In[1]:

var $1 = $$;

$1.text("tic");

After running this cell 1, the output of cell 1 is "tic".

Now, let's create cell 2 with the following code:

// In[2]:

var $2 = $$;

$1.text("tac");

$2.text("toe");

After running cell 2, "tac" is appended to the output of cell 1, and "toe" is shown in the output of cell 2.

IJavascript achieves this by only using msg_id's. Each time a cell is run, IJavascript updates $$. $$ amongst other things keeps a copy of msg_id. In the example above, I've kept a copy of the msg_id of cell 1 in $1 and a copy of the msg_id of cell 2 in $2.

This makes it possible for $1.text("tac") to trigger an execute_reply with the msg_id of cell 1, despite this statement being located in cell 2.

Note that this is exactly how display_id works. The difference is that msg_id's are associated with execution requests, and thus, if a cell is re-run, we get a different msg_id.

Please, let me know if this example isn't clear enough.

Now I'll try to reply to the points you made:

What I'm saying is that nbconvert and similar tools won't work as expected with kernels that send update_display_data messages (the reason being, as explained before, that update_display_data messages are sent after the status: idle message).

This is not true. update_display_data messages are typically sent before status:idle (in sync code, at least), just like all other display outputs. update_display may come after status:idle in async coe, just like all other outputs. The only difference is that the destination can be a cell other than the one producing the output.

[...]

send the status: idle message only when the kernel can confirm no more outputs will be sent.

Makes sense. There are no changes relevant to this in 5.1.

Just to be clear, would you be in favour of updating the messaging spec to clarify that when a kernel sends status: idle, the frontend can assume no more execute_reply, display_data and update_display_data messages shall be expected?

I think you've misinterpreted what's happening in the kernel. Instantiating an HTML object produces no output. Only calling display produces a display_data output. The final result of the cell produces a single execute_result output. Your example only produces one output, so no outputs are replaced. This is true all the way back to IPython 0.12. There continues to be no mechanism to replace a single output.

I fear this point of replacing vs appending is really hijacking the discussion. With the solution I propose (i.e. content.append), users could specify whether they want the output to be appended or replaced. Honestly to me this is a minor point, I'm more concerned about making the Jupyter messaging protocol safe for asynchronous code.

display_id isn't about associating an output with an input because, as you have pointed out, msg_ids outputs with new information, which can come from other cell executions. [...] This is also untrue. A content.append flag in display_data does not allow specific outputs to be addressed. To do this, an id must be applied to individual outputs. A boolean append flag would only allow updating the most recent display, which is not the goal.

I hope I've already answered these 2 points with the example code I wrote above.

This I think is the key. Replace vs append isn't a red herring, it's the entire point. display_id solves the problem of addressing specific individual outputs (not whole output areas, which msg_ids address). We have these requirements for updating outputs:

at an arbitrary time

from an arbitrary cell

without affecting the outputs before / after the target output of the same cell

Again, I hope the example above shows that msg_id's can be used for updating outputs:

at an arbitrary time
from an arbitrary cell
without affecting the outputs before / after the target output of the same cell

The append flag only allows updating the most recent output during the current cell's execution (as identified by msg_id), which isn't the goal of display_id.

This isn't correct, as shown in the example code above, it's possible to keep a reference to the msg_id of another cell (and as I said this is how IJavascript is now working; the example I've given can be tested in IJavascript now).

minrk commented 7 years ago

what you're saying is that msg_id's can't be used to update a cell output from another cell.

This is not what I am saying. I think there's still some misunderstanding about what an individual output is. A cell has potentially many outputs. We need display_id to target individual outputs (not a cell's whole output area) for update. Using msg_id allows updating or appending to a cell's entire output area, but not addressing individual outputs within that area, which is the goal of display_id.

Again, I hope the example above shows that msg_id's can be used for updating outputs:

at an arbitrary time

yes

from an arbitrary cell

yes

without affecting the outputs before / after the target output of the same cell

no, it does not accomplish this. Here's an example with display_id (using IPython master):

# Cell 1
display('before')
handle1 = display('update me', display_id='target')
display('after')
handle2 = display('also here', display_id='target-2')

# Cell 2
hande1.update('updated')
hande2.update('also updated')

After executing Cell 2, Cell 1's output should be:

before
updated
after
also updated

How can you accomplish this using only msg_id? This is what we need from display_id.

With the solution I propose (i.e. content.append), users could specify whether they want the output to be appended or replaced.

With msg_id and a boolean, users cannot specify which output is to be replaced, which is the goal of display_id. They can specify which cell's output area should be targeted with parent.msg_id, but not which output within that output area, which is precisely what we need.

This isn't correct, as shown in the example code above, it's possible to keep a reference to the msg_id of another cell (and as I said this is how IJavascript is now working; the example I've given can be tested in IJavascript now).

I hope I've now clarified with the example above that this does not cover the display_id case, which is targeting individual outputs within a cell's output area, not the outputs of a cell as a whole.

would you be in favour of updating the messaging spec to clarify that when a kernel sends status: idle, the frontend can assume no more execute_reply, display_data and update_display_data messages shall be expected?

This is a tangent, but due to the difficulty of tracking parents of async output in most kernels I'd lean toward this as a best practice, rather than a guarantee (i.e. SHOULD instead of MUST). I would tell frontends:

use status: idle to indicate that execution is complete. If running a terminating script (e.g. nbconvert), this should be used as the indicator that everything is complete and can exit. Async output produced after the last status:idle should not expect to be included.
while running, keep message handlers active for all cells with active output areas (i.e. do not clear handlers on status: idle, only on clear-output / re-execute / exit).

On the other side, kernel / notebook authors should know that 'async output' (by which I mean output produced after status: idle) may be lost if produced after the final status:idle of a terminating execution (e.g. nbconvert). So wherever possible ensure output is sent before status:idle, but acknowledging that this will not cover everything.

n-riesco commented 7 years ago

I hope I've now clarified with the example above that this does not cover the display_id case, which is targeting individual outputs within a cell's output area, not the outputs of a cell as a whole.

Yes, you have. Your example shows that display_id's allow multiple outputs per cell (herein, I'm going to call each output a display output).

After some more thought, I have to say that the interaction between asynchronous code and multiple display outputs can be very complex (I can think of many ways for asynchronous code to break a display output, even if a display_id were to be paired with a msg_id).

This is a tangent, but due to the difficulty of tracking parents of async output in most kernels I'd lean toward this as a best practice, rather than a guarantee (i.e. SHOULD instead of MUST). I would tell frontends:

use status: idle to indicate that execution is complete. If running a terminating script (e.g. nbconvert), this should be used as the indicator that everything is complete and can exit. Async output produced after the last status:idle should not expect to be included.

This wouldn't work if the last cell runs asynchronous code, or the previous asynchronous code hasn't completed before running the last cell. Anyway, I understand this is a compromise. Please, see my last comment, I have a suggestion to make this friendlier to kernels that run asynchronous code.

while running, keep message handlers active for all cells with active output areas (i.e. do not clear handlers on status: idle, only on clear-output / re-execute / exit).

Would it be OK to introduce a new message type? For example, status: closed and replace point 2 in your best practice with: "keep the message handler for a cell active until the kernel replies with status: closed to that cell request". In this way, tools like nbconvert would be able to determine accurately when they can render a cell.

Edit: I've rephrased the last paragraph to stress that each cell would have to be closed with a status: closed.

minrk commented 7 years ago

Your example shows that display_id's allow multiple outputs per cell (herein, I'm going to call each output a display output).

For a bit of clarification, multiple outputs per cell have always existed and are quite common (e.g. cells with text and figures, or multiple plots). You can call display as many times as you like in a cell, just like you can call print/console.log many times. ijavascript is already relying on this when calling $1.text("tac") in your example above, which creates an additional output each time. display_id only adds the ability to address the outputs after they have been created, but adding nothing to what outputs can be created or where or when.

Would it be OK to introduce a new message type?

What does status: closed accomplish that status: idle does not? If a kernel sends idle only after completing its async code, does that not already accomplish signaling that execution is complete? If status: closed is achievable, then doing the same thing with status: idle should work, right? If you already have a reliable mechanism in ijavascript to wait for and identify when async code started by a cell has finished, then that's when I would recommend that you send status:idle.

If there is truly a need to separate status:idle from another state, we can add one. But I don't yet see what status:closed accomplishes that sending status:idle when you would have sent status:closed would not.

n-riesco commented 7 years ago

For a bit of clarification, [...]

What I wasn't aware was that display_id's are meant to update just a portion of the output cell (i.e. they only update what I called a display output in my previous message).

What does status: closed accomplish that status: idle does not? If a kernel sends idle only after completing its async code, does that not already accomplish signaling that execution is complete? If status: closed is achievable, then doing the same thing with status: idle should work, right? If you already have a reliable mechanism in ijavascript to wait for and identify when async code started by a cell has finished, then that's when I would recommend that you send status:idle.

The algorithm in IJavascript is as follows:

if $$.async() hasn't been called during the execution of a cell, then IJavascript sends status: idle after execution.
if $$.async() was called, then IJavascript doesn't send status: idle, and waits until $$.done() is invoked somewhere else (either by asynchronous or synchronous code).

I'm having second thoughts about status: closed. What I don't like about status: closed is that:

it would close all the display outputs in a cell (and since we can have multiple display outputs each with a display_id, we really want to control when to close each display_id).
cells that don't define a display_id should be closed automatically (as stated in point 2 in the best practice you posted before)

Since what we want is to close a display_id, why not having a close_display_data message? In this way, kernels would have a mechanism for telling frontends (like nbconvert) that a display output is ready for rendering.

minrk commented 7 years ago

The algorithm in IJavascript is as follows:

That sounds very good, and should work well with all the existing display machinery, including display_id and update_display. It sounds like the nbconvert case is already covered, then.

Since what we want is to close a display_id.

But we don't want that. The way to close a display_id is to remove any instances of the id from the page. As long as they are on the page, they are and should remain 'open'.

In this way, kernels would have a mechanism for telling frontends (like nbconvert) that a display output is ready for rendering.

A display is immediately ready for rendering when it arrives, whether it has an id or not, just like before. The idea behind display_id is that outputs can be re-rendered at a later time as well.

Having a display_id is no indication that it will be updated, so display_id doesn't have any effect on the "is it done" question. It's perfectly okay to have every display with an id or none, and the logic should be no different at the end of execution.

n-riesco commented 7 years ago

But we don't want that. The way to close a display_id is to remove any instances of the id from the page. As long as they are on the page, they are and should remain 'open'.

Does that mean that the display output associated with a display_id is ephemeral (transient like its display_id) and thus it shouldn't be displayed by nbconvert?

Is this also true for display outputs that don't have a display_id?

Let me rephrase the question: if a frontend (not necessarily nbconvert) is asked to export a notebook, does the exported notebook preserve all the display outputs? Or only the execution results?

Having a display_id is no indication that it will be updated, so display_id doesn't have any effect on the "is it done" question.

Doesn't it have an effect in the sense that display outputs without a display_id can't be updated (and therefore, one could say they are immediately closed)?

minrk commented 7 years ago

Does that mean that the display output associated with a display_id is ephemeral (transient like its display_id) and thus it shouldn't be displayed by nbconvert? Is this also true for display outputs that don't have a display_id?

outputs with and without a display_id should be handled the same when they arrive. The only difference is that outputs with an id may be updated at a later point.

if a frontend (not necessarily nbconvert) is asked to export a notebook, does the exported notebook preserve all the display outputs? Or only the execution results?

All outputs. The only difference between display and execute_result is that execute_result has a prompt number attached (and typically there is zero or one execute_result, while there can be many display_data outputs, though that is not enforced).

Doesn't it have an effect in the sense that display outputs without a display_id can't be updated (and therefore, one could say they are immediately closed)?

There is no real sense of an 'open' or 'closed' output. All output for the whole notebook is only really complete when the last cell of the notebook has finished (i.e. status:idle from the last request).

n-riesco commented 7 years ago

There is no real sense of an 'open' or 'closed' output. All output for the whole notebook is only really complete when the last cell of the notebook has finished (i.e. status:idle from the last request).

Oh, well... In that case, asynchronous output in IJavascript could work if we rephrase the above completion criteria to "all the notebook output is complete when an status:idle has been received for each cell".

Would, at least, that be OK?

rgbkrk commented 7 years ago

all the notebook output is complete when an status:idle has been received for each cell

That seems satisfying for nbconvert at least

gnestor commented 7 years ago

@n-riesco I tried:

// In[1]:
var $1 = $$;
$1.text("tic");

// In[2]:
var $2 = $$;
$1.text("tac");
$2.text("toe");

but calling $1.text("tac"); doesn't actually update the output (append, replace, or otherwise) of Cell 1. I also tried it with a $$.async(); in Cell 1 and $1.done(); in Cell 2.

n-riesco commented 7 years ago

@gnestor Oh, man! I should do like Kyle and post screenshots to ensure my examples work. Here's the corrected example and a screenshot:

// In[1]:
var $1 = $$;
$1.async();
$1.text("tic", true);

// In[2]:
var $2 = $$;
$1.text("tac");
$2.text("toe");

screenshot from 2017-04-05 07-45-35

The call $$.text(result, keepAlive) takes an argument keepAlive because by default IJavascript enforces only one execution_result per cell (I decided to make this behaviour default, because it doesn't look good how the Jupyter notebook handles multiple execution_result's).

I'm happy you tested this example. It's made me realise that a call like $1.text("tic", true); should implicitly enable async mode; i.e. we should be able to write:

// In[1]:
var $1 = $$;
$1.text("tic", true);

// In[2]:
var $2 = $$;
$1.text("tac");
$2.text("toe");

I've opened issue https://github.com/n-riesco/nel/issues/5 for this.

minrk commented 7 years ago

Would, at least, that be OK?

Yeah, definitely.

n-riesco commented 7 years ago

@gnestor I think now we know all we need to come up with an API for display outputs in IJavascript.

Here are a few ideas:

I want to keep the current API (e.g. $$.text(), $$.mime(), ...) to send execute_result messages.
We need a constructor for display outputs. How about $$.newDisplay(id) (with id an optional argument)? Or just $$.display(id)?
$$.display() would create an object with the same methods as $$ (except async() and done()).

Here's an example of how this new API would look like:

// In[1]:
var display = $$.display("test");
display.text("Hello, World!");

// In[2]:
display.text("Bye, cruel World!");

n-riesco commented 7 years ago

Edit: [removed discussion on replace vs append]

var append = $$.display();
append.text("tic");
// output:
//     tic

var replace = $$.display("test");
replace.text("tac");
// output:
//     tic
//     tac

append.text("toe");
// output:
//     tic
//     tac
//     toe

replace.text("TAC");
// output:
//     tic
//     TAC
//     toe

gnestor commented 7 years ago

@n-riesco I think that $$.display(display_id?: string) looks fine for an initial implementation...

gnestor commented 7 years ago

To summarize this discussion thus far:

msg_id allows one cell to append outputs to the output area to another cell that has that msg_id via an excute_result message
display_id allows one cell to replace the content of a specific output in another cell via an update_display_data message

I'm of the opinion that ijavascript's user-facing API should be as consistent as possible with ipython's. I'm also hearing that ijavascript's user-facing API is the way it is because of the async nature of Javascript. I'd like to discuss the differences between ijavascript and ipython in hopes that we may settle upon something ipython-like ijavascript:

Display
- ijavascript: $$ provides text, html, jpeg, png, svg, and mime methods. These are currently handled with execute_result messages, but shouldn't they be handled with display_data messages?
- ipython: IPython.display provides HTML, Image, JSON, etc. It also provides display for displaying custom mime types, specifying display_id, etc. and update_display for updating displays.
- I propose that ijavascript provide display and update_display methods like ipython. They can be provided as globals, but I would prefer to import them from a library (e.g. import { display, updateDisplay } from 'ijavascript'). Type-specific display methods can also be provided the same way (e.g. import { display, text, image } from 'ijavascript').
Async
- ijavascript: $$ provides async and done methods which prevent outputs from being displayed until done is called. This prevents displaying a big Timeout object (which is the return value) and the async output and just displays the async output.
- @n-riesco Am I doing this justice?
- ipython: ipython doesn't seem to have an equivalent...except %%capture can be used to suppress output.
- @minrk Does ipython have an equivalent? Does it need one? If not, is it because Javascript behaves different than Python? Or because there is a language-agnostic way to dealing with async outputs?
- @rgbkrk: "display_data should be used for asynchronous rich outputs that aren't console outputs"
- @Carreau is working on an async/await feature in ipython (https://github.com/ipython/ipython/pull/10390) and maybe has some insights...
- Depending on the viability, I propose that ijavascript handle async outputs the same way that ipython does.
Globals
- ijavascript: $$ which provides a mix of display methods, async helpers, and sendResult and sendError and is bound to the individual cell from which it's called.
- @n-riesco What is the purpose of $$.sendResult and $$.sendError?
- ipython: %% magics which provide line- and cell-based commands that are specific to using a notebook
- If we can adopt a more ipython-like solution for displaying data (using display_data messages and a pure/stateless display function) and handling async output (using update_display_data?), then ijavascript may not need to provide a $$ global.

n-riesco commented 7 years ago

msg_id allows one cell to append outputs to the output area to another cell that has that msg_id via an excute_result message

This is how IJavascript works. IJavascript keeps the msg_id of a cell stored in $$. If a cell makes a copy (e.g. var $1 = $$;) and another cell uses $1 to send an execution result, then the execution result would look like it was sent from the first cell, and thus, shown in that cell.

I hope that my re-phrasing doesn't confuse you.

display_id allows one cell to replace the content of a specific output in another cell via an update_display_data message

Yes, but bear in mind that a cell output is comprised of one or more of the following types of outputs:

execution results
stdout
stderr
and display data (with or without display_id)

Of all this types of outputs, update_display_data can update only display data with a display_id.

n-riesco commented 7 years ago

I'm of the opinion that ijavascript's user-facing API should be as consistent as possible with ipython's. I'm also hearing that ijavascript's user-facing API is the way it is because of the async nature of Javascript.

I'm reluctant to make backwards-incompatible changes and cause unnecessary pain to IJavascript users.

Besides, if I understood correctly, what you really wanted is to develop an npm module that can be imported and that provides the same API as IPython.display.

Once we implement an API in $$ to send display_data and update_display_data messages, there's nothing stopping us from writing such a module. We can follow IPython's API as close as you need (while, under the hood, this module will use the API provided in $$).

n-riesco commented 7 years ago

I'd like to discuss the differences between ijavascript and ipython in hopes that we may settle upon something ipython-like ijavascript [...]

Please, can we leave it for another issue?

I don't really want the discussion to drag much longer.

And I really think we're in a position very close to start writing code (the only thing left from my point of view is to agree on an API in $$ to send display_data and update_display_data messages).

gnestor commented 7 years ago

Besides, if I understood correctly, what you really wanted is to develop an npm module that can be imported and that provides the same API as IPython.display.

Once we implement an API in $$ to send display_data and update_display_data messages, there's nothing stopping us from writing such a module. We can follow IPython's API as close as you need (while, under the hood, this module will use the API provided in $$).

Totally!

Ok, let's get started. I think your suggestion of $$.display(display_id?: string) works. If a display_id is provided, then it will send a display_data message with a display_id.

Carreau commented 7 years ago

I don't get particular insight about how a fully asynchronous kernel would work. My work in IPython is mainly to start exploring this. There are (IMHO) a lot of other assumptions in the protocol and in some frontends.

For example the notebook does not send completions requests is the kernel is busy. But obviously with an asynchronous kernel it could still respond to completions requests even if there are pending tasks.

So there might be quite a lot to though about if we want something completely async, one other is the cancellation of task without affecting subsequently submitted tasks.

n-riesco commented 7 years ago

@Carreau

I don't get particular insight about how a fully asynchronous kernel would work. My work in IPython is mainly to start exploring this. There are (IMHO) a lot of other assumptions in the protocol and in some frontends.

I think the kernel nanny could help get rid of all those assumptions in the frontends. The kernel nanny should handle all those assumptions, based on each kernels needs.

For example the notebook does not send completions requests is the kernel is busy. But obviously with an asynchronous kernel it could still respond to completions requests even if there are pending tasks.

Asynchrnous kernels like IJavascript would be much more interactive if those restrictions in the notebook were to be removed. I could for example make IJavascript prioritise inspection and completion requests over execution requests to make the frontend more responsive.

So there might be quite a lot to though about if we want something completely async, one other is the cancellation of task without affecting subsequently submitted tasks.

You don't really want to handle that case. This would mean that frontends would have to provide means for users to declare task dependencies. Do you have an use case in mind for this functionality?