Integration with jupytext

teucer commented 4 years ago

It would beneficial to integrate with jupytext and hence jupyter notebooks. Jupyter can indeed become an interactive editor for codebraid.

To be able do so a seamless conversion from codebraid to ipynb is required. I believe a pandoc filter could achieve that.

@gpoore Would you be interested in such an approach?

gpoore commented 4 years ago

A filter to convert to/from jupytext and Jupyter notebooks could be nice. I've thought about the possibility occasionally, but haven't actually attempted anything.

I can't invest any significant time in this, given my other projects and commitments. I'm happy to answer questions about how Codebraid works to help in the creation of filters, and might have time to help with converting a basic subset of Codebraid features to/from jupytext/Jupyter.

There are a few questions that would be useful to consider in thinking about this:

What do you gain by jupytext/Jupyter conversion?
- I'm guessing that the main thing is the interactive notebook editor. Is there anything else?
- If it is just for the notebook editor, it may also be worth thinking about ways to enhance Codebraid to make editing work better.
What do you lose by jupytext/Jupyter conversion?
- If all you are using is code blocks with .cb.nb, with a Jupyter kernel, for a single language, with a single session, without any special display settings, then it is possible to convert to/from jupytext/Jupyter without losing anything. Writing filters to handle that case would probably be easy, and could be worthwhile. This probably wouldn't have any particular advantages compared to other Jupyter-notebook-as-Markdown formats. But it could be useful if you're using Codebraid for lots of additional documents and want to have everything in the same format.
- Codebraid has lots of features that can't be converted to a Jupyter notebook. I've listed some of the things that are difficult below. You could figure out a way to convert to/from Jupyter without losing anything, but to do that, you would probably have to put a lot of things in specially marked code blocks (or something similar), since there isn't really a way to render them within a Jupyter notebook. At this point, the value of the notebook as an editor may start to become limited. Here are some things that are difficult to represent in a notebook:
  - Codebraid can run code with the built-in system, and that can't necessarily be replaced with a Jupyter kernel.
  - Within a single document, Codebraid can use multiple Jupyter kernels and multiple separate sessions per kernel. You would probably just have to choose one session for one language to be executable interactively, and would have to convert all other executable code into specially marked static code blocks.
  - Codebraid can run inline code, within paragraphs. I think you can get a limited subset of these capabilities by using notebook extensions.
  - Codebraid can show code and its output in separate locations in a document, with either appearing first.
  - Codebraid allows customization of what a Jupyter kernel displays. I suppose there might be a way to use Jupyter cell tags to handle this...that might save the display information after you convert back to Codebraid, but probably wouldn't be able to affect the display in the notebook.

teucer commented 4 years ago

While I am aware of all the limitations of jupyter compared to codebraid, I think having an interactive environment where you can execute chunks and see the result would be very powerful (basically compilation vs. REPL).

Right now I can only think of jupyter and its popularity. Leveraging it, even for a subset of codebraid, would add a lot of value.

timothymillar commented 4 years ago

I'm also interested in codebraid+jupytext but I'm not sure I have quite the same use-case as @teucer

My use-case is leveraging jupytexts command-line tools to convert scripts (.py, .R, .sh etc) to markdown. I want to prototype small workflows in a plain scripts then convert to a markdown file and execute code in notebook mode for documentation purposes.

Previously I had used Pweave for this but It seems to be unmaintained (for now) and I like that codebraid can work without jupyter kernels.

I have implemented one-way conversion ~~from jupytexts markdown~~* to a codebraid 'notebook' using a pandoc filter that inserts the .cb.nb class into every code block. It also checks for jupytext style metadata specifying a jupyter kernel and (if present) specifies the kernel in the first code block.

*EDIT: the filter works on pandoc ast (of course) so it can convert from any format supported by pandoc including .ipynb. Pandoc can then be used to convert back to a notebook.

The filter is here

It can be used like so:

pandoc <file> --filter cbnb.filter.py --to markdown ...

@gpoore would you be interested in a PR to add something like this?

mwouts commented 4 years ago

Hello @gpoore , thank you for considering this. I'll give my view point on a few questions, hopefully that can help:

What do you gain by jupytext/Jupyter conversion?

As you mention, there are two direct benefits:

Open/edit codebraid documents as notebooks in Jupyter
Convert codebraid documents from/to other formats (notebooks & scripts)

If we work on this, an indirect benefit will be a good certification on the round trip between .ipynb and codebraid documents - when we add a new format to Jupytext, we add a series of tests and example notebooks to make sure that no information is lost on round trips.

Codebraid has lots of features that can't be converted to a Jupyter notebook

I agree. We will have to determine how to represent these features in a notebook, even if they are not active there (until someone develops a Jupyter extension to activate them). Let me also note that we have had the same challenge for the R Markdown and for the Myst-Markdown formats. In practice, my target is that the same document should run, as much as possible, both in Jupyter and in the natural renderer (for instance, in R Markdown we have to comment out Jupyter magic commands because knitr can't run them).

If you want to go further, I propose that someone with a good knowledge of codebraid writes two Python functions, one for converting a codebraid document to a Jupyter notebook (using new_notebook, new_code_cell etc from nbformat.v4.nbbase), and the other for doing the reverse. With that I could easily test the round trip at a larger scale, and then propose a version of Jupytext extended to codebraid. What do you think?

gpoore commented 4 years ago

@timothymillar Thanks for linking to your filter!

I'm not sure at this point if I want to include filters within Codebraid itself. I've looked into Jupyter conversion some more, and it appears that filters can only handle a very limited set of cases.

However, I think it could be worthwhile to start collecting Codebraid-related filters. I can create a Codebraid wiki page for filters and link to your filter from there for now. Then perhaps at some point it may be worth creating a separate repo for filters.

gpoore commented 4 years ago

I've done some research and experimentation, and have a proposal for how to proceed with this.

There is a subset of Codebraid features that maps exactly to a Jupyter notebook, so I think it makes sense to start with support for that and then gradually add support for other things (in some form) from there. So to start with, implementation is simpler at the cost of some documents failing to convert.

Since Pandoc already has ipynb support and Codebraid is based on manipulating the Pandoc abstract syntax tree (AST), I think the easiest way forward is to convert a Codebraid Pandoc Markdown document into Pandoc's Markdown representation of a Jupyter notebook. Then Pandoc can create the actual notebook from there. Converting from Jupyter notebook to Codebraid is just the reverse, again with Pandoc's Markdown notebook as an intermediary.

I've done a few experiments, and this looks like a straightforward process for the subset of Codebraid features that maps exactly to a Jupyter notebook. I can add support for this as I have time, and then that will provide a starting point for working on the more difficult cases as they are needed. It's possible that many people who want to convert back and forth won't need the features that are especially difficult to translate.

@mwouts I can plan on adding options to the codebraid executable that will convert a Codebraid document into Pandoc's Markdown representation of a notebook and also to ipynb, plus options to convert from those formats back to Codebraid. For jupytext, would you prefer the capability to use the relevant part of Codebraid as a library rather than using the codebraid executable via subprocess? If so, are there any particular features you need or that are useful? Either way, Codebraid would run Pandoc as a subprocess to handle some of the conversions.

mwouts commented 4 years ago

Hi @gpoore , these are great news!

I think I'd prefer to use codebraid as a library (as we do for the md:myst format) rather than calling it via subprocess (as we do for the md:pandoc format), as in my experience it is faster and easier to debug.

To integrate codebraid into jupytext, I'll need two functions, one to convert codebraid text (a string) to nbformat notebooks (I mean, what you get when you do nbformat.read("notebook.ipynb", as_version=4)), and the other to do the opposite.

I would also have use of the following:

codebraid's format version number (maybe you want it to be equal to codebraid's version number?), and the minimal version number that the installed version of codebraid can read. For our md format this is for instance: https://github.com/mwouts/jupytext/blob/bc1b15935e096c280b6630f45e65c331f04f7d9c/jupytext/formats.py#L73-L85
a function that can tell me if a string is a codebraid document, like matches_mystnb for the myst format at https://github.com/mwouts/jupytext/blob/bc1b15935e096c280b6630f45e65c331f04f7d9c/jupytext/formats.py#L289-L290
one or more sample codebraid documents to activate the round-trip tests on them

mfhepp commented 2 years ago

Not having followed the entire discussion, I still would like to share my use-case:

For multiple Python-related classes, I am using RISE to present, in an iteractive fashion, aspects of the language and libraries. RISE has the great advantage of allowing ad-hoc variations of code ("What would happen if we changed this to ....?"), which adds a lot of value to the audience, IMO.

For preparation and reviewing, I provide lecture materials in multiple formats, namely

a PDF document,
an HTML version for use inside a Learning Management System, and
PDF versions of the slides, using a Beamer template.

All this is being produced from a version-controlled, authoritative Markdown document per unit using a sophisticated Pandoc workflow with filters etc.

Now, Jupytext does a pretty good job of creating and keeping in sync a Markdown and an *.ipynb version of the main document, but, by design (and to the best of my knowledge), strips off all output cells and there is no way to preserve them.

But for lecture notes, you typically want to show the code snippets and their effects.

Despite quite some research, I do not have an ideal solution; starting from the *.ipynb document via Pandoc to Markdown is painful, as e.g. references (@foo2022a) are escaped and inside fenced divs etc.; one would have to use a complicated filter to clean up.

It is a lot easier starting from the Jupytext-provided Markdown document, except for a few quirks and the missing output cells.. IMO, a straightforward solution would be for Jupytext to add CodeBraid classes to PandocCode blocks, as follows:

Jupytext Output:

```python 
for i in range(3):
    print(i)


**With CodeBraid classes:**

for i in range(3):
    print(i)



The same could be done for `Bash` and other supported types of code blocks.

From an implementation point of view, this could be handled by

- a Pandoc filter that simply adds the CodeBraid classes to code blocks, or
- by Jupytext as an option for Markdown output, which I would prefer.

The actual execution of CodeBraid could be left to the workflow that processes the Markdown representation; IMO, there is no need to hard-wire the two components.

As for interactivity: I would not spend too much effort on this, because CodeBraid and Jupyter cover very different parts of the spectrum - for a highly interactive class, changing and running code snippets as we go is much better in Jupyter Notebook than in any more advanced Markdown-Workflow (I would also not use Quarto in here.).

Apologies if this is lengthy and a bit off topic; I **really** appreciate your efforts on the CodeBraid and Jupytext sides; they help tremendously in doing better in teaching and research! Hence, a huge thank you to @gpoore and @mwouts!

gpoore / codebraid

Integration with jupytext #32