Parse markdown outputs in notebooks along with the rest of the markdown on a page in order to programmatically generate MyST content

choldgraf commented 8 months ago

Jupyter notebook cells can produce text output that is specific to markdown (text/markdown). It'd be useful if MyST could parse this output along with the rest of the markdown in the notebook. This would allow for programmatic generation of notebook content, and along with the {embed} directive could allow you to nicely stitch together MyST content via any Jupyter Kernel.

For example:

CleanShot 2024-03-22 at 11 05 46@2x

As an example, the MyST-nb Sphinx extension documents this functionality here.

An example from our docs where this would be useful

In the admonitions docs here:

https://github.com/executablebooks/mystmd/blob/c0f51ee51890158081736722cb01bcf510699786/docs/admonitions.md?plain=1#L34-L98

We have a big list of all the different types of admonitions for demonstration. It's just boilerplate myst repeated over and over. If we parsed Markdown outputs as MyST, we could replace it with something like:

inner = []
for admonition in ["note", "tip", "warning"]:
    s = """
    ````{tab-item}
    ```{%s}
    This is an %s admonition.

````
""" % admonition
inner.append(s)

template = """

%s
````` % "\n".join(inner)
"""

It's simple to write a notebook which produces graphs / tables, ... based on an account number
Now I have 5 accounts and want this one notebook to do the same analysis with all of them. An overview would be the balance of each one in a table. But I'd like to be able to switch tabs/expand rows of the different account to see the detailed graphs.
Alternatively, I'd like to have automatically one notebook per account. As the code is the same, I don't wanna copy/paste

For different use-cases different approaches might be best. I'm wondering what might be possible:

have on the complete notebook level tabs, i.e. execute the notebook like github actions matrix with different parameters
have tabs within each output (which might be syncronized when switching (?))
Having outputs combined with hidden content, e.g. a table in which i can expand individual rows to see more details.

choldgraf commented 2 weeks ago

Workaround: Write to temporary text files and then use `{include}`

I discovered this workaround today, it's a bit cludgy but isn't too hacky. It takes advantage of the fact that code cells are executed before a page is parsed by MyST. This means that you can do something like the following:

## Generate content with Jupyter

```{code-cell} python
from pathlib import Path
p = Path("../_build/txt/tmp.txt")
p.parent.mkdir(parents=True, exist_ok=True)
_ = p.write_text("- **Testing**\n- Testing two\n- Testing three")

And then include it in the page with MyST markdown like so:



This will:

1. Generate some MyST Markdown in Jupyter
2. Write it to a `.txt` file
3. And later in the page, in MyST MD, we reference that `.txt` file with an `{include}` statement

So the file is first executed, the `txt` file is created, and MyST then includes it

agoose77 commented 2 weeks ago

@choldgraf we should also make it possible to include files with .myst.json extensions, to support pulling in AST.

choldgraf commented 2 weeks ago

Yeah I was thinking that too. It also made me wonder if the notebook cell MyST support could be more like the plugin structure, rather than making it language specific.

For example, a cell tag like output-myst or output-myst-ast that would tell MyST to parse stdout as MyST (similar to what executable plugins do).

Does that make sense? If so I can update the issue body to reflect that suggestion.

agoose77 commented 2 weeks ago

@choldgraf the natural way to do this would be to define a MIME type for MyST AST, and recognise it in our transforms!

choldgraf commented 2 weeks ago

That sounds like a good idea. Though if it were the only way, then each kernel would need to have a package that outputs MyST right? The benefit of tags and stdout is that anybody in any language could use it without developing anything specific.

agoose77 commented 2 weeks ago

Although we don't need a package for this (certainly with ipython, you can just use display), I'm curious to understand your thought process - are you picturing a user printing stringified json to the stdout? Or myst markup, I.e text/markdown?

choldgraf commented 2 weeks ago

My idea was inspired by the way that you handled "black box" outputs for the executable plugins infrastructure. So below I'll share a Python and an R cell that would generate either MyST MD or MyST AST. At build time, if the tag was identified on a cell, then any text/plain output (or however stdout gets logged) would be parsed differently. Without those tags, the output would just be parsed like any other stdout.

Below I'll use Print, but I think since Jupyter "returns" the result of the final executed line, it may not be necessary. It'd also cool if this worked with variables too, so that you could insert generated MyST elsewhere.

Python - MyST MD

```{code-cell} python
:tags: output-myst-md
print("- **Bolded** list item")


Python - MyST AST[^1]

:tags: output-myst-ast
ast = {
      "type": "list",
      "ordered": false,
      "spread": false,
      "children": [
        {
          "type": "listItem",
          "spread": True,
          "children": [
            {
              "type": "strong",
              "children": [
                {
                  "type": "text",
                  "value": "Bolded"
                }
              ]
            },
            {
              "type": "text",
              "value": " list item"
            }
          ]
        }
      ]
    }
print(ast)



[^1]: This is also a good demonstration of why I think it's way nicer to be able to parse MyST MD directly and not _only_ MyST AST :-)

fperez commented 2 weeks ago

I love the idea of parsing plain/MysT mardkown proper, as it's very easy to generate from many tools. In my code I very often have something like

from IPython.display import display, Markdown
md = lambda s: display(Markdown(s))

and I use md(x) everywhere as a "print markdown" shorthand to generate legible, pretty reporting with zero fuss.

Having the ability to properly handle that (along perhaps with a new MyST object in IPython.display) would be very useful, I think.

choldgraf commented 1 week ago

I decided to create a separate issue to track generating MyST AST directly from cell outputs, since that might be an easier short-term solution and get us part of the way there:

https://github.com/jupyter-book/mystmd/issues/1633

jupyter-book / mystmd