Open choldgraf opened 1 year ago
Thanks @choldgraf! We're definitely thinking about how all of this fits together, and it's a thorny problem! Will keep you posted.
Congratulations!!! Hope things are as easy as they can be, and that you get well soon!
Thanks @choldgraf, ermm I'd be happy to listen in remotely. Is there any provision for this?
(Was thinking to attend myself, but it was trumped by the fact that I'm off to a conference in Las Vegas next week 😝)
I'm attending remotely, let me ping @fcollonval to see about late admission!
Thanks for launching this discussion @choldgraf!
We looked into the myst-notebook format documented on the Jupyterbook website, as well as some other ideas that came up in the Jupyter Community Workshop on text-based notebook formats.
If we want to make such a format an "officially supported format" by Jupyter, key requirements would be:
Namespacing admonitions
Now, another looser requirement would be that it should be reasonable for third-parties to support this format (e.g. native rendering of such notebook files by GitHub). On this front, the code-cell
admonition seems to be too "top-level" to be adopted. One way we could get away with this would be to namespace admonitions. It would be a much easier ask for e.g. GitHub to implement our spec in jupyter-namespaced admonitions than to add a large number of top-level admonitions to their supported spec.
```{jupyter:code}
:execution_count: 1
1 + 1
☝️ Example input cell. Execution count is not mandatory, but a renderer of `jupyter:code` would know what to do with it.
````markdown
```{jupyter:output}
:execution_count: 1
:output_type: execute_result
{ "text/plain" : "2" }
☝️ Corresponding output cell, the result of the execution. `output_type` is required. Mime bundle is raw JSON so that it can be used as-is by renderers without any processing.
````markdown
```{jupyter:code}
:execution_count: 2
print(1 + 1)
☝️ Similar input cell, but using a print statement instead of a mime bundle.
````markdown
```{jupyter:output}
:output_type: stream
2
☝️ Corresponding output cell, the result of the previous one. `output_type` is required, but this is a stream. Execution count is never included in stream output.
**GitHub Flavored Markdown**
Another worry is that GFM seems to be moving in the direction of allowing admonitions, but with a slightly different syntax. Are there been any discussions with the folks over at GitHub about possible convergences?
**Highlighting**
One thing that would make the raw textual content of markdown-based notebooks more readable would be to have a nice CodeMirror (6) syntax highlighting mode for Myst that dims the color of yaml frontmatter and shorthand options, so that readers can see the important content more easily.
This is important as notebooks generated by Jupyter user interfaces will have more metadata attached to them (execution count, cell metadata) than what a person would manually type in a markdown document. Proper highlighting in JupyterLab would mitigate this issue.
Thanks for the update @SylvainCorlay
For what is worth, I would mirror this feeling on GitHub's beta feature: https://github.com/community/community/discussions/16925#discussioncomment-4748880
I would also note, if you want "rich markdown", then https://github.com/jgm/djot (a recent endeavour by the creator of pandoc and member of the commonmark committee) I feel really the best shot at having a truly "rigorous" and standardised syntax. Too that end, for admonitions they use https://htmlpreview.github.io/?https://github.com/jgm/djot/blob/master/doc/syntax.html#div, which is essentially what myst has started to adopt:
Hi @SylvainCorlay, this is awesome. From your code suggestions the only immediate questions I have are (1) how multiple outputs to a single cell are represented; and (2) how you split markdown cells.
Example: I have put together a sketch here, which almost parses as-is in MyST, so it might give you some other ideas.
For (1): all of the examples that you posted only have a single output rather than an outputs list. I think there is a bit of a mismatch with the current spec, e.g. execution_count
on each output, which really exists at the cell level (even if it is stored on the output part of the cell as you have suggested). I am not sure of the solution for this, but calling the directive {jupyter:outputs}
(with a s
) and having each output on a line could help?
```{jupyter:outputs}
:execution_count: 2
{ "output_type" : "display_data", "data": ... }
{ "output_type": "error", ...}
For (2): splitting markdown cells, I think this is important to have in the base spec especially if we are going for full reproduction as a serialization format. That needs to encode metadata as well.
We have done this implicitly in the myst notebooks with splitting on code-cells, however it needs to be explicit for markdown-markdown split -- we did that with a "block-break" ([spec](https://myst-tools.org/docs/spec/blocks#specification)) with json metadata. I think this is in-family with your other suggestions.
+++ {"tags": ["tag1"]}
Having a way to store the outputs as well as making cell IDs visible would be a big step up. I think that both of those could be optional of course for serialization, and that opens up a lot of workflows and can integrate with existing tools without too much work!
Really enjoyed the workshop this week, and had fun working with @agoose77 @stevejpurves and others! Looking forward to the next steps!
```{jupyter:code}
:execution_count: 1
1 + 1
Another discrepancy I would note here, is that I assume this is proposing to store code cell metadata as:
:execution_count: 1
:metadata: {"tags": ["tag1"], "other": "value"}
1 + 1
which is different to the current way:
:tags: ["tag1"]
:other: value
1 + 1
or even:
---
tags:
- tag1
other: value
---
1 + 1
This is better from a "programmatic"/spec sense, since really directive options are intended to be `str` -> `str` mappings and `code-cell` is the outlier here (being basically `str` -> `str`/`list`/`dict`), which would be nice to fix
However it is possibly less "user-friendly"
@SylvainCorlay Thanks for bringing some more context into the convo here!
I've been at the workshop that last three days and participated in a bunch of the discussions around the text based format both with @SylvainCorlay and more so today the wider group. There is some really positive momentum there and the point at which the initial pre-draft proposal is at the end of today is really nice.
Important point though is that the proposed syntax, whilst boardly the same (and well aligned with MyST) has moved on from that outlines by @SylvainCorlay above... the latest are different at a detail level, so there is probably limited us in scrutinizing what is on this thread in detail, syntax wise.
Before I just speak to some of the points @SylvainCorlay raised above I want to communicate whatI was decided by the group at the end of today's session; probably in the next day an issue will be opened on the https://github.com/jupyter/enhancement-proposals communicating the work one and posting a like to the working document, after it's received the final bit for clean up the group wanted to apply. After that it's the groups plan to have a draft jep PR open by the end of the next week to formally start the process.
So i'd watch out for those events in order to be able to review the whole proposal and discussion around it.
To give my opinion on this and speak to @SylvainCorlay's points:
md
and ipynb
, sacrificing readability to some degree but maintaining the portability, the self-contained nature of the notebook and still satisfying a number of use cases and requirements. Educational usage, better version control, flexible loading, streaming are all better served by the format then the ipynb
. vnd._________
in mimetypes
so vendors could use vendor.core-directive
to borrow the semantic intent of the core directive but still have probably a completely separate custom directive implementation. so :+1: on that one!mystjs
and rendered by myst-to-react
) so they already work beautifully in jupyterlab/myst, there are other gaps though, that are going to be easy to resolve -- watch for an issue on that very soonjupyterlab/myst
for notebooks and for the md
file types too!mystjs
via some extension/plugin system too.Overall it's been a great few days and I think we should aim to contribute to the JEP around this as much as we can.
Thanks for the update @stevejpurves all sounds fun 😄
The notion of namespace on directives is interesting and useful.
Just to note this is already part of myst, they are known as domains
GFM admonitions are already supported
Indeed. Parsing them isn't so difficult. It's just that I don't feel they should be "core" myst syntax, given that (a) we already have a defined admonition syntax, and (b) the GFM syntax is disputable in that it changes the semantic meaning of blockquote syntax (something I was literally just talking about with @rowanc1 regarding attributes on paragraphs 😅)
Responding to @rowanc1
For (1): all of the examples that you posted only have a single output rather than an outputs list
Indeed, we addressed this in discussion on whether to have a single output list, or multiple output directives in sequence.
(On the proposal you wrote, note that stream
and display_data
outputs don't have an execution count.)
Reponding to @chrisjsewell
Another discrepancy I would note here, is that I assume this is proposing to store code cell metadata [...]
Indeed, cell tag are just one type of metadata at the moment.
We could move to move them outside of the main metadata field - but this should be a separate JEP from the textual notebook format, and be done both in both the current ipynb format and the new textual format.
Really, my comments were more about the directives for notebooks entirely in markdown: https://jupyterbook.org/en/stable/file-types/myst-notebooks.html. I think we should absolutely namespace them - and have a discussion on output admonitions. (Maybe better define our future common admonitions before using the general Jupyter namespace in case we converge on a slightly different format).
The notion of namespace on directives is interesting and useful.
Just to note this is already part of myst, they are known as domains
There may be multiple calls to display()
in an input cell, and that's why there are multiple distinct outputs in the output cell ipynb nbformat json.
Each object displayed by display()
MAY return multiple output representations.
obj._repr_mimebundle_()
returns text/plain
, text/markdown
, text/html
, and application/ld+json
,
which output format(s) should the .myst
notebook contain?application/ld+json
are added as <script type="application/json">
HTML to the markdown
CDATA
in XML formats like XHTML but not HTML5:IPython.display.display
does not have a _repr_markdown_
, but there is an IPython.display.Markdown
with a text/markdown
MIME type.
- `_repr_html_`: return raw HTML as a string, or a tuple (see below).
- `_repr_json_`: return a JSONable dict, or a tuple (see below).
- `_repr_jpeg_`: return raw JPEG data, or a tuple (see below).
- `_repr_png_`: return raw PNG data, or a tuple (see below).
- `_repr_svg_`: return raw SVG data as a string, or a tuple (see below).
- `_repr_latex_`: return LaTeX commands in a string surrounded by "$",
or a tuple (see below).
- `_repr_mimebundle_`: return a full mimebundle containing the mapping
from all mimetypes to data.
Use this for any mime-type not listed above.
Really two practical use cases; from https://github.com/chmp/ipytest/issues/89 :
Could the usage examples from Example.ipynb be inlined into the README.md?
- https://github.com/chmp/ipytest/blob/main/Example.ipynb
- The last time I tried to email a nb with output to a mailing list, IIRC it was easiest to
pandoc --from=html --to=gfm
than to try and save the input and output cells to Markdown with Jupyternbconvert
orjupytext
. (... Why {base64 output etc} is not included in most non-.ipynb
notebook representations:- HTML5/RDFa is not XHTML, and inlined HTML should have
CDATA
and/or must be escaped, which is whatnbconvert
does when generating HTML from.ipynb
JSON. From https://stackoverflow.com/questions/3302648/should-i-use-cdata-in-html5 :<!--//--><![CDATA[//><!-- ... //--><!]]> ``` )
I wasn't sure where was the best place to ping you all, so I figured I'd just put it here since this issues probably semi-relevant to the discussions. But ping @chrisjsewell @rowanc1 @stevejpurves @agoose77 @sylvaincorlay and @nthiery
There are a bunch of people meeting in Paris right now to discuss potential foundations, constraints, etc for a markdown-based version of Jupyter Notebooks. I had a quick chat with @sylvaincorlay about this and he said that they'd looked at myst and thought it was very close to what would be needed, with a few differences. We discussed a few potential outcomes, but I think our goal could be to find a compromise in MyST syntax that would be acceptable to serve as a "Canonical Jupyter Notebook Markdown Format". It might be a subset of all the syntax MyST supports, but figuring that out is something that is probably best done via live conversation.
I'm pinging you all just because I know people are thinking and discussing this right now at the Jupyter Formats workshop, so wanted to signal-boost it in case you all wanted to organize a chat. I won't be able to attend because I am still super sick and I have a 6 day old infant 🙃 . But consider yourselves pinged!