Pre-proposal: Specify the Markdown cell's markdown flavor

jupyter / enhancement-proposals

Enhancement proposals for the Jupyter Ecosystem

https://jupyter.org/enhancement-proposals

BSD 3-Clause "New" or "Revised" License

115 stars 65 forks source link

Pre-proposal: Specify the Markdown cell's markdown flavor #98

Open fcollonval opened 1 year ago

fcollonval commented 1 year ago

This proposal was developed during Jupyter Community Workshop on the Notebook format (Paris, 28 Feb - 2 Mar).

This is a proposition to add a new key to markdown cells defining precisely the flavor of markdown used in the cell source. The associated rendered content should be stored for compatibility with the tooling ecosystem. As a side effect, it forces us to clarify the current default markdown flavor that should be used as fallback when a client does not support the specified flavor.

The focus of the associated JEP is primarily on describing current behaviour.

During the workshop we wrote a draft JEP to ease structuring the goals and discussion. You can find it on the following draft PR: https://github.com/jupyter/enhancement-proposals/pull/99

For reference the document use during the workshop is https://docs.google.com/document/d/1B8mhaHud7DMY55q1mg5sSDhZ96FGC6cbJpypYO1BocA/

fcollonval commented 1 year ago

The opened questions already seen are:

Should the default be text/markdown? And the fallback be Original Markdown?
Should clients update the mimetype key to match what they are rendering/supporting after editing?
Should frontend only change a markdown mimetype in an edited cell, and only change that cell?
Is it ok for a notebook to have cells in a different format?
Should it be specified as document level key and interpreted as a hint, i.e. a frontend serializing the notebook could write a mimetype into the notebook to communicate how clients should try to interpret markdown cells and then make best efforts to do so (similar to the kernel information).
Should there be a single output or an array of outputs to align better with code cells? In the second case:
- what to do if there are more than 1 output
- should we force the output type to a single value? And if yes, the most appropriate seems display_data

bollwyvl commented 1 year ago

Some thoughts:

Should the default be text/markdown?

For users' sake, the default should be as backwards-compatible as possible, therefore probably text/dollar-math+x-gfm or something.

And the fallback be Original Markdown?

The original author of Markdown has granted that fenced code blocks and tables pretty much were grossly missing, and it is widely known the reference perl implementation contains bugs and onconsistences that can never be fixed.

The only real fallback is text/plain (and ideally in what human language), though some of the bespoke markdown formats are starting to get really far from either plain text or a human language.

Should clients update the mimetype key to match what they are rendering/supporting after editing?

Seems like "you change it, you buy it" it less surprising: opening a notebook shouldn't makes changes. If just consuming it (or annotating), then it should likely consume an output, if saved, then try to render.

Is it ok for a notebook to have cells in a different format?

Sure, why not? Also it's hard as hell to "enforce" anything in an ecosystem where we can't get major players to even follow the schema. So better just to plan for it.

Should it be specified as document level key

Would reduce portability, also see above.

Should there be a single output or an array of outputs

display_data ftw.

As something of a counterpoint, over on #95...

...we're considering an even more drastic approach that collapses formerly-N, now-3 cell types to One Cell Type. So, in a nutshell going from:

cells:
  - cell_id: cell-id-abcd-12345
    cell_type: markdown
    special_markdown_thing: zany
    source:
      - "# Zany Markdown"

cells:
  cell-id-abcd-12345:
    source:
      data:
        text/zany+markdown:
          - "# Zany Markdown"

rgbkrk commented 1 year ago

I'm happy to see this start to get codified as its an essential step to having a real spec.

The long term goal should be to specify the markdown format as rigorously as CommonMark and GitHub flavored markdown do. At least specifying which variant of markdown is supported in a notebook is a good first start. Just happy to see people moving this along.

jjallaire commented 1 year ago

FWIW GitHub doesn't appear to publish a consolidated document of what is currently supported by their markdown parser. Piecing it together from their docs and blog posts, I think it currently constitutes the original spec https://github.github.com/gfm/ + tex math dollars + footnotes + emojis + diagrams. I agree something like classic gfm + tex math dollars is probably adequate as a baseline, but just noting that footnotes and diagrams should at least also be discussed.

jjallaire commented 1 year ago

One idea introduced in the draft doc and JEP proposal is having a fallback rendering available for clients that don't know how to deal with the markdown variant in play. One suggestion was text/html as the fallback, however another possibility would be having the fallback be the standard/baseline flavor of markdown for Jupyter (as if the fallback is HTML then it won't really be convertible to e.g. PDF or DOCX). So for example, the fallback for a callout/admonition could be a blockquote with a heading (rather than raw HTML).

tonyfast commented 1 year ago

i agree in the identifying the markdown flavor. while doing so we want to do in the context of notebooks and cells. i think this enhancement proposal continues to differentiate markdown and code cells. this enhancement adds mimetype and output to the markdown cell keys. if it is adopted, the markdown and cell schemas will diverge further. with the proposed scenario, markdown cells independently have properties attachments, output, mimetype and code cells have outputs.

it might be important to consider what kind of precedence this divergence might set. perhaps we consider the likenesses between markdown and code cells to get the similar representations. some suggestion i have are (all schema in toml for convenience):

use the existing code cell outputs convention instead of output? do we want to maintain the nuance between an outputs and output? yes, an output in a mimebundle, but do we need the extra key when the code cell demonstrates a prior art?
rather than add the mimetype key, could we extend the cell_type to accept a mimetype? with this, all of the markdown variants are possible along with out jupyter short hand.
```
[[markdown_cell.properties.cell_type.oneOf]]
const = "markdown"

[[markdown_cell.properties.cell_type.oneOf]]
pattern = "\w+/[-.\w]+(?:\+[-.\w]+)?"
```
put our heads, we overlap meetings anyway, together to think about how an "extraSchemas" top level concept could give the best of both worlds. this scenario feels like it could accommodate the desire for expressing the markdown extensions effectively.

krassowski commented 1 year ago

There is an interesting feature request to support defining links in one markdown cell and using them in others: https://github.com/jupyterlab/jupyterlab/issues/14260. How would the proposed specification of markdown flavour interact across cells? Would it be possible that one cell has flavour X and another cell has flavour Y? If so, what would happen if a link definition is in a cell with with flavour X but link in cell Y?

Would it be better to have flavour defined per-notebook rather than per-cell? What are arguments for having a per-cell definition of markdown flavour?

tonyfast commented 1 year ago

Would it be better to have flavour defined per-notebook rather than per-cell? What are arguments for having a per-cell definition of markdown flavour?

in general, markdown flavors don't ever really conflict with footnotes and link definitions. their syntax is defined in the commonmark spec. even if there were multiple variants i wouldn't expect much variance across markdown variants.

because of this consistency, it should be possible for notebook provides to have globally scoped references. eg with pidgy we carry the references around in a scope that can be reused. mutability has some sharp corners, but since the notebook is an intermediate document, the mutability is sorted out in the final document translation to html, md or pdf.

stevejpurves commented 1 year ago

@krassowski I just made a comment over on the PR about the per-cell definition.

However, in the context of this JEP as we're not proposing anything that is able to provide a proper fallback (i.e. a text/markdown based version of the content alongside the text/markdown?variant=myst for example) then this is an interim step that provides a useful hint to front ends that may be able to consume that content more appropriately once it's identified -- to that extent perhaps it's better that it is notebook wide only and stays out of the way of proposed changes to individual cells.

Even though in that case I am still not clear on what a frontend should do after edits to markdown cells are made in a notebook that originated elsewhere else with a different markdown flavour identified in that miimetype.