jupyter / enhancement-proposals

Enhancement proposals for the Jupyter Ecosystem
https://jupyter.org/enhancement-proposals
BSD 3-Clause "New" or "Revised" License
116 stars 65 forks source link

Proposal: Transmit Cell Metadata #70

Open jflam opened 3 years ago

jflam commented 3 years ago

Transmit Cell Metadata

Hello!

We (@willingc and @MSeal) would like to propose a new JEP that describes transmitting cell metadata to kernels. This PR contains the JEP proposal that we originally discussed in the Pre-Proposal: Transmit Cell Metadata Issue.

Based on the discussion in that issue, I've updated the JEP to reflect the desire to send the cell metadata as part of the message header vs. the original proposal which argued for sending it in the content.

@minrk has graciously volunteered to shepherd this through the JEP process. Looking forward to continuing the discussion here!

Thx! -John, Carol, Matt

Resolve #68

choldgraf commented 3 years ago

@jflam just FYI the failing test is because Jupyter Book updated its spec for the TOC, and we haven't pinned the versions in this repository. I'm updating the toc structure here: https://github.com/jupyter/enhancement-proposals/pull/71

willingc commented 3 years ago

Hi folks,

This PEP has been open for 3 weeks. We'll be collecting feedback for another week. On Monday July 12, we will request a vote.

Thanks!

Carol

meeseeksmachine commented 3 years ago

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/jep-70-transmit-cell-metadata/9877/1

jasongrout commented 3 years ago

Another few questions trying to understand the scope of this proposal:

  1. Is there anything that frontends without the notion of a "cell" or "cell metadata" do? For example, the ipython console or JupyterLab Code Console?
  2. What about comm messages? Often ipywidgets sends comm messages that are sort of triggered from a cell - do they need to send cell metadata? What about debug messages?
  3. Right now, as the JEP says:

    The attributes within the metadata today are: A) small in size and B) not harmful to send across the wire so keeping the solution simpler was the preferred pattern in the proposal.

    Of course, we don't know what custom extensions are putting in cell metadata today, and it seems limiting to make these assumptions about all cell metadata for the future as well. I would strongly suggest a specific place for cell metadata that is meant to be sent to the kernel, rather than always sending everything. For example, we could have a new top-level dictionary for cell metadata to be sent to the kernel. Or perhaps a white-list of fields to send can be in the notebook metadata, so extensions can add metadata and tell the frontend to send it?

  4. One of the motivation examples includes providing hints about client capabilities. It seems awkward if each cell has repeated metadata about the frontend capabilities. For that usecase, is it better to think about this metadata as "any data the frontend wants to send the kernel" rather than just "cell metadata"?
jasongrout commented 3 years ago

4. One of the motivation examples includes providing hints about client capabilities. It seems awkward if each cell has repeated metadata about the frontend capabilities. For that usecase, is it better to think about this metadata as "any data the frontend wants to send the kernel" rather than just "cell metadata"?

I agree with Min's thoughts here, that we should separate cell metadata from other types of metadata that is transmitted.

jasongrout commented 3 years ago

2. What about comm messages? Often ipywidgets sends comm messages that are sort of triggered from a cell - do they need to send cell metadata? What about debug messages?

Min's thoughts here indicate we should think about this as a generic way of associating cell metadata with any message, so the answers to the above would generally be "yes", i.e., if the message is closely associated with a cell, the relevant metadata should be transmitted.

jasongrout commented 3 years ago

In other words, since the metadata field already exists in these messages, a frontend can already transmit this information if it wishes, so presumably a frontend and kernel can coordinate on this metadata already. Is this JEP really about mandating that the metadata is transmitted from the frontend?

I think it would be very clarifying to see exactly what changes are going to be made to the message spec document as part of this JEP, e.g., are there new prescribed fields in the message metadata dict, what their purpose is, etc.

minrk commented 3 years ago

@jasongrout thanks for the review!

Yes, I think we should specify that cell metadata SHOULD be transmitted as-is in message.metadata.cell for any message reasonably associated with a cell (rather than directly at top-level message.metadata so there's room for e.g. client metadata or others without collisions).

jflam commented 3 years ago

Thanks for the thoughtful comments, @jasongrout!

One of the motivation examples includes providing hints about client capabilities. It seems awkward if each cell has repeated metadata about the frontend capabilities. For that usecase, is it better to think about this metadata as "any data the frontend wants to send the kernel" rather than just "cell metadata"?

I agree with Min's thoughts here, that we should separate cell metadata from other types of metadata that is transmitted.

I think you've done a great job at identifying a key tension in this JEP: cell metadata vs. client metadata. Looking at Min's comments from above, what do folks think about having two different fields in the message metadata, i.e., message.metadata.cell and message.metadata.client? This would make the intent clearer at the cost of additional complexity. There are other optimizations possible such as transmitting client metadata only if an actual change was initiated by the client, for example.

Is there anything that frontends without the notion of a "cell" or "cell metadata" do? For example, the ipython console or JupyterLab Code Console?

One of the primary scenarios for transmitting cell metadata is identifying the programming language in an execute_request message. In this case, I wonder if there isn't a list of reserved/standard messages that need to be there so that clients like the ones you describe above will know what message to transmit to the kernel, rather than that being an implicit contract between a front end and the kernel?

Right now, as the JEP says:

The attributes within the metadata today are: A) small in size and B) not harmful to send across the wire so keeping the solution simpler was the preferred pattern in the proposal.

Of course, we don't know what custom extensions are putting in cell metadata today, and it seems limiting to make these assumptions about all cell metadata for the future as well. I would strongly suggest a specific place for cell metadata that is meant to be sent to the kernel, rather than always sending everything. For example, we could have a new top-level dictionary for cell metadata to be sent to the kernel. Or perhaps a white-list of fields to send can be in the notebook metadata, so extensions can add metadata and tell the frontend to send it?

Creating an allow-list in this manner, allowing the notebook to specify it is a different take than what we originally rejected with the kernel being the source of truth. It does have the downside of additional complexity in implementation. Thinking out loud: if there is an explosion in the size of cell metadata down the road, we could mandate that requirement in a future JEP as an optimization that clients would be eager to implement. This would give us a simpler implementation for this JEP and give us time to gather evidence for the more complex implementation. What do you think?

jasongrout commented 3 years ago

One of the primary scenarios for transmitting cell metadata is identifying the programming language in an execute_request message.

Here's a thought: instead of transmitting all cell metadata, or even a language name, how about we take a cue from our existing mechanism for identifying rich output formats, and instead transmit a mimetype field that identifies the format of code being transmitted? The mimetype of a typical python execute request could be text/x-python, for example. This naturally opens the door for other kinds of input to even single-language kernels, for example a special mimetype that represents a JSON datastructure for setting values in a language runtime, or other mimetypes that represent a chunk of binary data you are sending to the kernel (like an image or parquet file). Perhaps a sql kernel usually has text sql input, but one of the inputs could be a JSON datastructure containing info about what database to connect to (with a mimetype specific to that format) - that db info is sent in an execute_request with a mimetype of application/vnd.sqlkernel.dbinfo+json.

I realize it's basically the same thing as a language field - a string that identifies to the kernel how to handle the execution/completion request. I usually think of a kernel as having a single "language" (this is implicit in the kernel spec, where the kernel has a "language" identified), but it seems okay to say that a given kernel can receive and act on multiple types of inputs.

jflam commented 3 years ago

Yes- using a MIME type to identify a language would make perfect sense! In the prior art section of the JEP, I do mention a few other "meta-kernels" that exist as well that would benefit from this mechanism.

I do like the examples that you give, and it seems like some of those would be metadata that is not likely to be persisted in the .ipynb itself (e.g., sending chunk of binary data) and some that are likely to be persisted, (e.g., setting values in the language runtime - of course depends on what those values represent ...). I wonder if there's a more general concept "runtime metadata?" where some of the metadata comes from cell metadata and others are synthesized by the front end/extensions at runtime?

meeseeksmachine commented 3 years ago

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/propagating-user-environment-information-to-ipython-kernels-and-more-generally/10931/1

meeseeksmachine commented 3 years ago

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/notebook-cell-type-generaliastion/10703/1