mwouts / jupytext

Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
https://jupytext.readthedocs.io
MIT License
6.65k stars 386 forks source link

Pandoc-incompatible extra meta-data from RISE Notebooks #997

Open mfhepp opened 2 years ago

mfhepp commented 2 years ago

RISE is a very useful extension to Jupyter Notebook that allows presenting notebooks directly in the form of Reveal.JS slides.

Now, when converting such notebooks to Markdown, cells that directly contain RISE-specific meta-data like slide and sub_slide result in Markdown that is incompatible with Pandoc`s fenced div syntax:

Example:

This cell

 {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7f044916",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "1\n",
      "2\n"
     ]
    }
   ],
   "source": [
    "for i in range(3):\n",
    "    print(i)"
   ]
  },

creates the following Markdown:

```python slideshow={"slide_type": "subslide"}
for i in range(3):
    print(i)

However, this extra meta-data `slideshow={"slide_type": "subslide"}` is not Pandoc Markdown syntax, and it would be more useful to translate it into something like:
for i in range(3):
    print(i)

or, cleaner, as a [Pandoc nested fenced div with attributes](https://pandoc.org/MANUAL.html#extension-fenced_code_attributes), like so:
```{.python }
for i in range(3):
    print(i)

A similar problem exists with RISE Speaker Notes. In the notebook, they look like so:

  {
   "cell_type": "markdown",
   "id": "786500ac",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "These are speaker notes.\n",
    "\n",
    "$c=\\sqrt{a^2 + b^2}$"
   ]
  },

The conversion turns them into

<!-- #region slideshow={"slide_type": "notes"} -->
These are speaker notes.

$c=\sqrt{a^2 + b^2}$
<!-- #endregion -->

While this does not break the layout, the type of the content is not accessible to Pandoc filters (at least not easily).

IMO, it would be better to represent them as

<!-- #region slideshow={"slide_type": "notes"} -->
:::{.slideshow slide_type="notes"}
These are speaker notes.

$c=\sqrt{a^2 + b^2}$
:::
<!-- #endregion -->

/CC @tarleb

mfhepp commented 2 years ago

For completeness, here are the Jupytext settings I used:

jupyter:
  celltoolbar: Slideshow
  jupytext:
    cell_metadata_filter: all,-trusted
    formats: ipynb,md
    notebook_metadata_filter: all
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.3'
      jupytext_version: 1.14.0
  kernelspec:
    display_name: aih
    language: python
    name: aih
  language_info:
    codemirror_mode:
      name: ipython
      version: 3
    file_extension: .py
    mimetype: text/x-python
    name: python
    nbconvert_exporter: python
    pygments_lexer: ipython3
    version: 3.10.5
---
mfhepp commented 2 years ago

/CC @damianavila

mfhepp commented 2 years ago

Cross-referencing https://github.com/mwouts/jupytext/issues/66

mwouts commented 2 years ago

Hi @mfhepp , do I understand correctly that you are looking for Pandoc's markdown representation of notebooks? If so, did you try to pair your notebook with the md:pandoc format, rather than with the default md format?