Open teucer opened 4 years ago
Hello @teucer, that is an interesting suggestion! And yes, I would be happy to take a PR that integrates codebraid
with jupytext
.
The first question I'll ask you is: how safe is the round trip from a codebraid document to a Jupyter Notebook? Is the conversion implemented in codebraid
? I am asking because codebraid documents seem to be a bit different from notebooks (for instance, multiple languages are allowed, and code may not be executed with Jupyter kernels, etc...)
If the answer is positive, then it should not be too difficult to plug codebraid
into jupytext
- you could have a look at how the md:myst
or md:pandoc
are implemented, using external tools like pandoc
or myst-parser
.
{.python .cb.run copy=part1+part2 session=copied show=code+stdout:raw example=true}
. One could use a pandox filter to do the conversion. The cell level meta data can be leveraged for the "key=value" pairsCodebraid just uses regular pandoc markdown but a user can specify different methods of running a code block using
attributes.
It provides a "notebook mode" using the attribute .cb.nb
which would make a sensible default for conversion between other notebook formats.
You can also specify the use of a jupyter kernel in the first code block using jupyter_kernel=...
.
The codebraid repo doesn't have examples with pandoc divs for cells, but based on some quick testing it works fine (presumably passed through to pandoc). So a strait forward codebraid backend could just be a slight variation of the pandoc backend.
e.g. this:
::: {.cell .markdown}
# A quick insight at world population
:::
::: {.cell .code}
\``` {.python}
import pandas as pd
import wbdata as wb
pd.options.display.max_rows = 6
pd.options.display.max_columns = 20
\```
:::
::: {.cell .markdown}
Corresponding indicator is found using search method - or, directly,
the World Bank site.
:::
::: {.cell .code}
\``` {.python}
wb.search_indicators('Population, total') # SP.POP.TOTL
# wb.search_indicators('area')
# => https://data.worldbank.org/indicator is easier to use
\```
:::
becomes this:
::: {.cell .markdown}
# A quick insight at world population
:::
::: {.cell .code}
\``` {.python .cb.nb}
import pandas as pd
import wbdata as wb
pd.options.display.max_rows = 6
pd.options.display.max_columns = 20
\```
:::
::: {.cell .markdown}
Corresponding indicator is found using search method - or, directly,
the World Bank site.
:::
::: {.cell .code}
\``` {.python .cb.nb}
wb.search_indicators('Population, total') # SP.POP.TOTL
# wb.search_indicators('area')
# => https://data.worldbank.org/indicator is easier to use
\```
:::
and if a jupyter kernel spec is specified in the meta it could also be specified in the first code block:
::: {.cell .markdown}
# A quick insight at world population
:::
::: {.cell .code}
\``` {.python .cb.nb jupyter_kernel=python3}
import pandas as pd
import wbdata as wb
pd.options.display.max_rows = 6
pd.options.display.max_columns = 20
\```
:::
::: {.cell .markdown}
Corresponding indicator is found using search method - or, directly,
the World Bank site.
:::
::: {.cell .code}
\``` {.python .cb.nb}
wb.search_indicators('Population, total') # SP.POP.TOTL
# wb.search_indicators('area')
# => https://data.worldbank.org/indicator is easier to use
\```
:::
I have written a small pandoc filter to covert pandoc ast to a codebraid "notebook").
This adds .cb.nb
to each code block and if a a jupyter kernel is defined in the metadata (in the style of jupytext) it adds jupyter_kernel=<name>
to the first codeblock.
I have briefly tested it on a jupytext markdown file and a ipynb and it seems to work fine with both.
Converting directly from an ipynb using the script obviously creates the same verbose pandoc-markdown output as jupytext (with the additional code block data). This output can also be converted back to an ipynb using pandoc.
That's great, thank you @timothymillar!
The next step will be to add the codebraid
format to Jupytext. For this, you will have to tell me how to convert the codebraid document to a Jupyter notebook, and back. For the pandoc
format we use md_to_notebook
and notebook_to_md
in jupytext/pandoc.py, do you think you could implement similar functions? Do you think these functions and the filter should belong to jupytext
, or maybe rather to codebraid
(cc @gpoore)?
I have briefly tested it on a jupytext markdown file and a ipynb and it seems to work fine with both.
That's a good start! The next step will be to test on our collection of test notebooks, see e.g. https://github.com/mwouts/jupytext/blob/bc1b15935e096c280b6630f45e65c331f04f7d9c/tests/test_mirror.py#L133-L139
@mwouts I forgot to link the related codebraid issue.
I personally think integration with codebraid could be really nice, some thing like codebraid notebook <pandoc-compatible-file> ...
.
But It may be the antithesis of what @gpoore is aiming for.
I'm not sure that a specific codebraid backend for jupytext would add much in terms of jupytexts goals. The pandoc backend is compatible with code braid, it simply lacks codebraid specific classes that tell codebraid to run and/or echo results of codeblocks. So the pandoc output can be built with codebraid but the result will be identical to building with pandoc.
I initially though it would be a big convenience to be able to convert a script directly to pandoc markdown with codebraid classes (my main use case) but this can be achieved with a one liner (using the filter):
jupytext script.py --to pandoc -o - | pandoc --filter cbnb.filter.py --to markdown ...
Currently it seems to be possible to convert 'regular' pandoc/codebraid markdown file (i.e. no divs) to jupytext via github flavored markdown, but this looses metadata and may not work well with more complex examples:
pandoc codebraid.md --to gfm | jupytext - --to ipynb ...
If anything an additional 'simplified' pandoc backend might be useful for jupytext users.
This would allow for conversion to/from pandoc markdown without the cell divs (::: {.cell .code}
) which are not commonly used.
It would improve jupytext inter-op with many pandoc based tools including codebraid.
@teucer seemed to be suggesting more specific translation of cell level metadata between formats so I'd be interested to here more detail on that.
codebraid is a Python program that enables executable code in Pandoc Markdown documents.
It is similar to rmarkdown and claims certain advantages over it.
It would be beneficial to support it as well.