mwouts / jupytext

Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
https://jupytext.readthedocs.io
MIT License
6.64k stars 386 forks source link

Support codebraid #545

Open teucer opened 4 years ago

teucer commented 4 years ago

codebraid is a Python program that enables executable code in Pandoc Markdown documents.

It is similar to rmarkdown and claims certain advantages over it.

It would be beneficial to support it as well.

mwouts commented 4 years ago

Hello @teucer, that is an interesting suggestion! And yes, I would be happy to take a PR that integrates codebraid with jupytext.

The first question I'll ask you is: how safe is the round trip from a codebraid document to a Jupyter Notebook? Is the conversion implemented in codebraid? I am asking because codebraid documents seem to be a bit different from notebooks (for instance, multiple languages are allowed, and code may not be executed with Jupyter kernels, etc...)

If the answer is positive, then it should not be too difficult to plug codebraid into jupytext - you could have a look at how the md:myst or md:pandoc are implemented, using external tools like pandoc or myst-parser.

teucer commented 4 years ago
timothymillar commented 4 years ago

Codebraid just uses regular pandoc markdown but a user can specify different methods of running a code block using attributes. It provides a "notebook mode" using the attribute .cb.nb which would make a sensible default for conversion between other notebook formats. You can also specify the use of a jupyter kernel in the first code block using jupyter_kernel=....

The codebraid repo doesn't have examples with pandoc divs for cells, but based on some quick testing it works fine (presumably passed through to pandoc). So a strait forward codebraid backend could just be a slight variation of the pandoc backend.

e.g. this:

::: {.cell .markdown}
# A quick insight at world population
:::

::: {.cell .code}
\``` {.python}
import pandas as pd
import wbdata as wb

pd.options.display.max_rows = 6
pd.options.display.max_columns = 20
\```
:::

::: {.cell .markdown}
Corresponding indicator is found using search method - or, directly,
the World Bank site.
:::

::: {.cell .code}
\``` {.python}
wb.search_indicators('Population, total')  # SP.POP.TOTL
# wb.search_indicators('area')
# => https://data.worldbank.org/indicator is easier to use
\```
:::

becomes this:

::: {.cell .markdown}
# A quick insight at world population
:::

::: {.cell .code}
\``` {.python .cb.nb}
import pandas as pd
import wbdata as wb

pd.options.display.max_rows = 6
pd.options.display.max_columns = 20
\```
:::

::: {.cell .markdown}
Corresponding indicator is found using search method - or, directly,
the World Bank site.
:::

::: {.cell .code}
\``` {.python .cb.nb}
wb.search_indicators('Population, total')  # SP.POP.TOTL
# wb.search_indicators('area')
# => https://data.worldbank.org/indicator is easier to use
\```
:::

and if a jupyter kernel spec is specified in the meta it could also be specified in the first code block:

::: {.cell .markdown}
# A quick insight at world population
:::

::: {.cell .code}
\``` {.python .cb.nb jupyter_kernel=python3}
import pandas as pd
import wbdata as wb

pd.options.display.max_rows = 6
pd.options.display.max_columns = 20
\```
:::

::: {.cell .markdown}
Corresponding indicator is found using search method - or, directly,
the World Bank site.
:::

::: {.cell .code}
\``` {.python .cb.nb}
wb.search_indicators('Population, total')  # SP.POP.TOTL
# wb.search_indicators('area')
# => https://data.worldbank.org/indicator is easier to use
\```
:::
timothymillar commented 4 years ago

I have written a small pandoc filter to covert pandoc ast to a codebraid "notebook"). This adds .cb.nb to each code block and if a a jupyter kernel is defined in the metadata (in the style of jupytext) it adds jupyter_kernel=<name> to the first codeblock.

I have briefly tested it on a jupytext markdown file and a ipynb and it seems to work fine with both.

Converting directly from an ipynb using the script obviously creates the same verbose pandoc-markdown output as jupytext (with the additional code block data). This output can also be converted back to an ipynb using pandoc.

mwouts commented 4 years ago

That's great, thank you @timothymillar!

The next step will be to add the codebraid format to Jupytext. For this, you will have to tell me how to convert the codebraid document to a Jupyter notebook, and back. For the pandoc format we use md_to_notebook and notebook_to_md in jupytext/pandoc.py, do you think you could implement similar functions? Do you think these functions and the filter should belong to jupytext, or maybe rather to codebraid (cc @gpoore)?

I have briefly tested it on a jupytext markdown file and a ipynb and it seems to work fine with both.

That's a good start! The next step will be to test on our collection of test notebooks, see e.g. https://github.com/mwouts/jupytext/blob/bc1b15935e096c280b6630f45e65c331f04f7d9c/tests/test_mirror.py#L133-L139

timothymillar commented 4 years ago

@mwouts I forgot to link the related codebraid issue.

I personally think integration with codebraid could be really nice, some thing like codebraid notebook <pandoc-compatible-file> .... But It may be the antithesis of what @gpoore is aiming for.

I'm not sure that a specific codebraid backend for jupytext would add much in terms of jupytexts goals. The pandoc backend is compatible with code braid, it simply lacks codebraid specific classes that tell codebraid to run and/or echo results of codeblocks. So the pandoc output can be built with codebraid but the result will be identical to building with pandoc.

I initially though it would be a big convenience to be able to convert a script directly to pandoc markdown with codebraid classes (my main use case) but this can be achieved with a one liner (using the filter):

jupytext script.py --to pandoc -o - | pandoc --filter cbnb.filter.py --to markdown ...

Currently it seems to be possible to convert 'regular' pandoc/codebraid markdown file (i.e. no divs) to jupytext via github flavored markdown, but this looses metadata and may not work well with more complex examples:

pandoc codebraid.md --to gfm | jupytext - --to ipynb ...

If anything an additional 'simplified' pandoc backend might be useful for jupytext users. This would allow for conversion to/from pandoc markdown without the cell divs (::: {.cell .code}) which are not commonly used. It would improve jupytext inter-op with many pandoc based tools including codebraid.

@teucer seemed to be suggesting more specific translation of cell level metadata between formats so I'd be interested to here more detail on that.