machine-learning-apps / mystify

Jupyter backend for textual notebooks in MyST format
MIT License
12 stars 2 forks source link

A quick note on the myst spec #9

Open choldgraf opened 4 years ago

choldgraf commented 4 years ago

Thanks for taking a pass at this. A few quick thoughts I noticed when looking at some of the cells, regarding the myst spec:

The documentation for the MyST-Notebook format is here:

https://myst-nb.readthedocs.io/en/latest/use/markdown.html

though I'm happy to improve that if there are things that are confusing. In addition, there is already a MyST markdown parser here: https://myst-parser.readthedocs.io/ which also describes the syntax a bit more. MyST Notebooks are a subset of MyST Markdown. There's also a Python API for a parser in case that helps with iteration: https://myst-parser.readthedocs.io/en/latest/using/use_api.html

also note: in case it's helpful, we've already got a MyST markdown plugin for vscode, in case it helps figure out the syntax. It'll do highlighting and some completion for you: https://marketplace.visualstudio.com/items?itemName=ExecutableBookProject.myst-highlight

I should also note: For obvious reasons I'd prefer not to have two different "MyST-based notebook formats" out there. If there's a way that this approach can be ~ the same as what is in the MyST-NB spec, that would be great. If there are ways that the MyST-NB spec should evolve (e.g. to include some notion of an output block) then we can discuss.

issue-label-bot[bot] commented 4 years ago

Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.

choldgraf commented 4 years ago

Another thought here - I think that using the MyST syntax will also result in a pretty big performance hit because it's running regexes for a bunch of things that aren't needed in a MyST Notebooks converter. For example, MyST searches for in-line syntax like {role}`content` as well as markdown syntax like **highlights**. When we are scoping to just converting between ipynb and MyST notebooks, we can basically reduce the tokens that MyST needs to:

```{directive}

to capture the boundaries of cells

and

cell: metadata



to capture the notebook-level and cell-level metadata, and maybe the output bundles as well

and that's it. We don't need to care about parsing the markdown **content** of notebooks, because we'd assume that some renderer would do that, not an I/O package. 

Restricting the parsing syntax to just those tokens could speed up reading by quite a bit.