Open teucer opened 4 years ago
A filter to convert to/from jupytext and Jupyter notebooks could be nice. I've thought about the possibility occasionally, but haven't actually attempted anything.
I can't invest any significant time in this, given my other projects and commitments. I'm happy to answer questions about how Codebraid works to help in the creation of filters, and might have time to help with converting a basic subset of Codebraid features to/from jupytext/Jupyter.
There are a few questions that would be useful to consider in thinking about this:
.cb.nb
, with a Jupyter kernel, for a single language, with a single session, without any special display settings, then it is possible to convert to/from jupytext/Jupyter without losing anything. Writing filters to handle that case would probably be easy, and could be worthwhile. This probably wouldn't have any particular advantages compared to other Jupyter-notebook-as-Markdown formats. But it could be useful if you're using Codebraid for lots of additional documents and want to have everything in the same format.While I am aware of all the limitations of jupyter compared to codebraid, I think having an interactive environment where you can execute chunks and see the result would be very powerful (basically compilation vs. REPL).
Right now I can only think of jupyter and its popularity. Leveraging it, even for a subset of codebraid, would add a lot of value.
I'm also interested in codebraid+jupytext but I'm not sure I have quite the same use-case as @teucer
My use-case is leveraging jupytexts command-line tools to convert scripts (.py, .R, .sh etc) to markdown. I want to prototype small workflows in a plain scripts then convert to a markdown file and execute code in notebook mode for documentation purposes.
Previously I had used Pweave for this but It seems to be unmaintained (for now) and I like that codebraid can work without jupyter kernels.
I have implemented one-way conversion from jupytexts markdown* to a codebraid 'notebook' using a pandoc filter that inserts the .cb.nb
class into every code block.
It also checks for jupytext style metadata specifying a jupyter kernel and (if present) specifies the kernel in the first code block.
*EDIT: the filter works on pandoc ast (of course) so it can convert from any format supported by pandoc including .ipynb. Pandoc can then be used to convert back to a notebook.
The filter is here
It can be used like so:
pandoc <file> --filter cbnb.filter.py --to markdown ...
@gpoore would you be interested in a PR to add something like this?
Hello @gpoore , thank you for considering this. I'll give my view point on a few questions, hopefully that can help:
What do you gain by jupytext/Jupyter conversion?
As you mention, there are two direct benefits:
If we work on this, an indirect benefit will be a good certification on the round trip between .ipynb
and codebraid
documents - when we add a new format to Jupytext, we add a series of tests and example notebooks to make sure that no information is lost on round trips.
Codebraid has lots of features that can't be converted to a Jupyter notebook
I agree. We will have to determine how to represent these features in a notebook, even if they are not active there (until someone develops a Jupyter extension to activate them). Let me also note that we have had the same challenge for the R Markdown and for the Myst-Markdown formats. In practice, my target is that the same document should run, as much as possible, both in Jupyter and in the natural renderer (for instance, in R Markdown we have to comment out Jupyter magic commands because knitr
can't run them).
If you want to go further, I propose that someone with a good knowledge of codebraid
writes two Python functions, one for converting a codebraid
document to a Jupyter notebook (using new_notebook
, new_code_cell
etc from nbformat.v4.nbbase
), and the other for doing the reverse. With that I could easily test the round trip at a larger scale, and then propose a version of Jupytext extended to codebraid
. What do you think?
@timothymillar Thanks for linking to your filter!
I'm not sure at this point if I want to include filters within Codebraid itself. I've looked into Jupyter conversion some more, and it appears that filters can only handle a very limited set of cases.
However, I think it could be worthwhile to start collecting Codebraid-related filters. I can create a Codebraid wiki page for filters and link to your filter from there for now. Then perhaps at some point it may be worth creating a separate repo for filters.
I've done some research and experimentation, and have a proposal for how to proceed with this.
There is a subset of Codebraid features that maps exactly to a Jupyter notebook, so I think it makes sense to start with support for that and then gradually add support for other things (in some form) from there. So to start with, implementation is simpler at the cost of some documents failing to convert.
Since Pandoc already has ipynb support and Codebraid is based on manipulating the Pandoc abstract syntax tree (AST), I think the easiest way forward is to convert a Codebraid Pandoc Markdown document into Pandoc's Markdown representation of a Jupyter notebook. Then Pandoc can create the actual notebook from there. Converting from Jupyter notebook to Codebraid is just the reverse, again with Pandoc's Markdown notebook as an intermediary.
I've done a few experiments, and this looks like a straightforward process for the subset of Codebraid features that maps exactly to a Jupyter notebook. I can add support for this as I have time, and then that will provide a starting point for working on the more difficult cases as they are needed. It's possible that many people who want to convert back and forth won't need the features that are especially difficult to translate.
@mwouts I can plan on adding options to the codebraid
executable that will convert a Codebraid document into Pandoc's Markdown representation of a notebook and also to ipynb, plus options to convert from those formats back to Codebraid. For jupytext, would you prefer the capability to use the relevant part of Codebraid as a library rather than using the codebraid
executable via subprocess? If so, are there any particular features you need or that are useful? Either way, Codebraid would run Pandoc as a subprocess to handle some of the conversions.
Hi @gpoore , these are great news!
I think I'd prefer to use codebraid
as a library (as we do for the md:myst
format) rather than calling it via subprocess (as we do for the md:pandoc
format), as in my experience it is faster and easier to debug.
To integrate codebraid
into jupytext
, I'll need two functions, one to convert codebraid
text (a string) to nbformat
notebooks (I mean, what you get when you do nbformat.read("notebook.ipynb", as_version=4)
), and the other to do the opposite.
I would also have use of the following:
codebraid
's format version number (maybe you want it to be equal to codebraid's version number?), and the minimal version number that the installed version of codebraid
can read. For our md
format this is for instance: https://github.com/mwouts/jupytext/blob/bc1b15935e096c280b6630f45e65c331f04f7d9c/jupytext/formats.py#L73-L85codebraid
document, like matches_mystnb
for the myst
format at https://github.com/mwouts/jupytext/blob/bc1b15935e096c280b6630f45e65c331f04f7d9c/jupytext/formats.py#L289-L290codebraid
documents to activate the round-trip tests on themNot having followed the entire discussion, I still would like to share my use-case:
For multiple Python-related classes, I am using RISE to present, in an iteractive fashion, aspects of the language and libraries. RISE has the great advantage of allowing ad-hoc variations of code ("What would happen if we changed this to ....?"), which adds a lot of value to the audience, IMO.
For preparation and reviewing, I provide lecture materials in multiple formats, namely
All this is being produced from a version-controlled, authoritative Markdown document per unit using a sophisticated Pandoc workflow with filters etc.
Now, Jupytext does a pretty good job of creating and keeping in sync a Markdown and an *.ipynb
version of the main document, but, by design (and to the best of my knowledge), strips off all output cells and there is no way to preserve them.
But for lecture notes, you typically want to show the code snippets and their effects.
Despite quite some research, I do not have an ideal solution; starting from the *.ipynb
document via Pandoc to Markdown is painful, as e.g. references (@foo2022a
) are escaped and inside fenced divs etc.; one would have to use a complicated filter to clean up.
It is a lot easier starting from the Jupytext-provided Markdown document, except for a few quirks and the missing output cells.. IMO, a straightforward solution would be for Jupytext to add CodeBraid classes to PandocCode blocks, as follows:
Jupytext Output:
```python
for i in range(3):
print(i)
**With CodeBraid classes:**
for i in range(3):
print(i)
The same could be done for `Bash` and other supported types of code blocks.
From an implementation point of view, this could be handled by
- a Pandoc filter that simply adds the CodeBraid classes to code blocks, or
- by Jupytext as an option for Markdown output, which I would prefer.
The actual execution of CodeBraid could be left to the workflow that processes the Markdown representation; IMO, there is no need to hard-wire the two components.
As for interactivity: I would not spend too much effort on this, because CodeBraid and Jupyter cover very different parts of the spectrum - for a highly interactive class, changing and running code snippets as we go is much better in Jupyter Notebook than in any more advanced Markdown-Workflow (I would also not use Quarto in here.).
Apologies if this is lengthy and a bit off topic; I **really** appreciate your efforts on the CodeBraid and Jupytext sides; they help tremendously in doing better in teaching and research! Hence, a huge thank you to @gpoore and @mwouts!
It would beneficial to integrate with jupytext and hence jupyter notebooks. Jupyter can indeed become an interactive editor for codebraid.
To be able do so a seamless conversion from codebraid to ipynb is required. I believe a pandoc filter could achieve that.
@gpoore Would you be interested in such an approach?