gpoore / codebraid

Live code in Pandoc Markdown
BSD 3-Clause "New" or "Revised" License
376 stars 13 forks source link
bash javascript julia-language literate-programming markdown pandoc-filter pandoc-markdown python r-language reproducible-analysis reproducible-research reproducible-workflows rust rust-lang

Codebraid – live code in Pandoc Markdown

Codebraid is a Python program that enables executable code in Pandoc Markdown documents. Using Codebraid can be as simple as adding a class to your code blocks' attributes, and then running codebraid rather than pandoc to convert your document from Markdown to another format. codebraid supports almost all of pandoc's options and passes them to pandoc internally. See Codebraid Preview for VS Code for editor support. See the Codebraid website for additional examples and documentation.

Codebraid provides two options for executing code. It includes a built-in code execution system that currently supports Python 3.7+, Julia, Rust, R, Bash, JavaScript, GAP and SageMath. Code can also be executed using Jupyter kernels, with support for rich output like plots.

Development: https://github.com/gpoore/codebraid

Citing Codebraid: "Codebraid: Live Code in Pandoc Markdown", Geoffrey M. Poore, Proceedings of the 18th Python in Science Conference, 2019, 54-61.

View example HTML output, or see the Markdown source or raw HTML (the Python and Rust examples demonstrate more advanced features at the end):

Simple example

Markdown source test.md:

```{.python .cb-run}
var = 'Hello from Python!'
var += ' $2^8 = {}$'.format(2**8)
print(var)

Run `codebraid` (to save the output, add something like `-o test_out.md`, and
add `--overwrite` if it already exists):

```shell
codebraid pandoc --from markdown --to markdown test.md

Output:

Hello from Python! $2^8 = 256$

As this example illustrates, variables persist between code blocks; by default, code is executed within a single session. Code output is also cached by default so that code is only re-executed when modified.

Features

Comparison with Jupyter, knitr, and Pweave

Codebraid Jupyter Notebook knitr Pweave
multiple programming languages per document ✓* ✓† ✓*
multiple independent sessions per language
inline code execution within paragraphs
no out-of-order code execution ✓‡
no markdown preprocessor or custom syntax
minimal diffs for easy version control
insert code output anywhere in a document
can divide code into incomplete snippets
support for literate programming
compatible with any text editor

* One primary language from the Jupyter kernel. The IPython kernel supports additional languages via %%script magics. There is no continuity between %%script cells, because each cell is executed in a separate process. Some magics, such as those provided by PyJulia and rpy2, provide more advanced capabilities.
† knitr only provides continuity between code chunks for R, and more recently Python and Julia. Code chunks in other languages are executed individually in separate processes.
‡ Out-of-order execution is possible with R Markdown notebooks.


The table above summarizes Codebraid features in comparison with Jupyter notebooks (without extensions), knitr (R Markdown), and Pweave, emphasizing Codebraid's unique features. Here are some additional points to consider:

Jupyter notebooks — Notebooks have a dedicated, browser-based graphical user interface. Jupyter kernels typically allow the code in a cell to be executed without re-executing any preceding code, providing superior interactivity. Codebraid has advantages for projects that are more focused on creating a document than on exploratory programming.

knitr — R Markdown documents have a dedicated user interface in R Studio. knitr provides superior support for R, as well as significant Python and Julia support that includes R integration. Codebraid offers continuity between code chunks for all supported languages, as well as multiple independent sessions per language. It also provides unique options for displaying code and its output.

More about key features

Easy debugging — By default, stderr is shown automatically in the document whenever there is an error, right next to the code that caused it. It is also possible to monitor code output in real time during execution via --live-output.

Simple language support — Codebraid supports Jupyter kernels. It also has a built-in system for executing code. Adding support for a new language with this system can take only a few minutes. Just create a config file that tells Codebraid which program to run, which file extension to use, and how to write to stdout and stderr. See languages/ for examples.

No preprocessor — Unlike many approaches to making code in Markdown executable, Codebraid is not a preprocessor. Rather, Codebraid acts on the abstract syntax tree (AST) that Pandoc generates when parsing a document. Preprocessors often fail to disable commented-out code blocks because the preprocessor doesn't recognize Markdown comments. Preprocessors can also fail due to the finer points of Markdown parsing. None of this is an issue for Codebraid, because Pandoc does the Markdown parsing.

No custom syntax — Codebraid introduces no additional Markdown syntax. Making a code block or inline code executable uses Pandoc's existing syntax for defining code attributes.

Installation and requirements

Installation: pip3 install codebraid or pip install codebraid

Manual installation: python3 setup.py install or python setup.py install

Requirements:

Converting a document

Simply run codebraid pandoc <normal pandoc options>. Codebraid currently supports Pandoc Markdown (--from markdown) and CommonMark with Pandoc extensions (--from commonmark_x) as input formats.

Note that --overwrite is required to overwrite existing files. If you are using a defaults file, --from, --to, and --output must be given explicitly and cannot be inherited from the defaults file. If you are using a defaults file and converting to a standalone Pandoc Markdown document, --standalone should be given explicitly rather than being inherited from the defaults file.

codebraid should typically be run in the same directory as the document, so that the default working directory for code is the document directory.

If you are converting from Pandoc Markdown to Pandoc Markdown with --standalone (basically using codebraid to preprocess Markdown documents), note that the following YAML metadata fields and command-line options are ignored in that situation:

This is typically what you want. Usually, "include" and a table of contents are desired in a final output format like HTML or PDF, not in a Pandoc Markdown file. In the rare cases where "includes" and a table of contents are needed in Markdown documents, this can be accomplished by piping the output of codebraid through pandoc.

Additional non-Pandoc command-line options

Caching

By default, code output is cached, and code is only re-executed when it is modified. The default cache location is a _codebraid directory in the working directory (directory where codebraid is run, typically the document directory). This can be modified using --cache-dir. Multiple documents can share a single cache location. A cache directory can be synced between different operating systems (such as Windows and Linux) while retaining full functionality so long as documents are in equivalent locations under the user's home directory (as resolved by os.path.expanduser()).

When multiple documents share the same cache location, each document will automatically clean up its own unused, outdated files. However, if a document is deleted or renamed, it may leave behind unused files in the cache, so it may be worth manually deleting and regenerating the cache in those circumstances. Future cache enhancements should be able to detect all unused files, making this unnecessary.

If you are working with external data that changes, you should run codebraid with --no-cache or delete the cache as necessary to prevent the cache from becoming out of sync with your data. Future releases will allow external dependencies to be specified so that caching will work correctly in these situations.

YAML metadata

Some document-wide settings can be given in the Markdown YAML metadata. Codebraid settings must be under either a codebraid or codebraid_ key in the metadata. Pandoc will ignore codebraid_ so it will not be available to filters; this distinction should not typically be important.

To use Jupyter kernels automatically for all sessions, simply set jupyter: true. For example,

---
codebraid:
  jupyter: true
---

It is also possible to set a default kernel and/or default timeout. For example,

---
codebraid:
  jupyter:
    kernel: python3
    timeout: 120
---

A Jupyter kernel and/or timeout can still be set in the first code chunk for a given session, and will override the document-wide default.

It is also possible to set live_output: <bool> in the metadata. Additional metadata settings will be added in future releases.

Code options

Commands (Classes)

Code is made executable by adding a Codebraid class to its Pandoc attributes. For example, `code`{.python} becomes `code`{.python .cb-run}.

When code is executed, the output will depend on whether the built-in code execution system or a Jupyter kernel is used.

When code is executed with the built-in system, the output is equivalent to collecting all code for each session of each language, saving it to a file, and then executing it (with an added compile step for some languages). For example, running Python code is equivalent to saving it to file.py and then running python file.py, while running R code is equivalent to saving it to file.R and then running Rscript file.R. Code is not executed as it would be in an interactive session (like running python or R at the command prompt). As a result, some output that would be present in an interactive session is absent. For example, in interactive sessions for some languages, simply entering a variable returns a string representation without explicit printing, and plotting opens a separate image window or displays an image inline. Such output is absent in Codebraid unless it is also produced when code is executed as a script rather than in an interactive session. The .cb-expr command is provided for when an inline string representation of a variable is desired.

An option for interactive-style code execution with the built-in system is planned for a future release. In the meantime, many interactive-style features are available between the .cb-expr command and Jupyter kernels.

When code is executed with a Jupyter kernel, the default output will be equivalent to executing it in a Jupyter notebook. Rich output such plots, images, and LaTeX math will be displayed automatically by default. This can be customized by using the show and hide options.

All classes for making code executable are listed below. These all have the form .cb-<command>. Classes with the form .cb.<command> (period rather than hyphen) are supported for Pandoc Markdown (--from markdown), but not for commonmark_x since it has a more restricted class syntax. The forms shown below (.cb-<command>) should be preferred for compatibility across Markdown variants supported by Pandoc.

Keyword arguments

Pandoc code attribute syntax allows keyword arguments of the form key=value, with spaces (not commas) separating subsequent keys. value can be unquoted if it contains only letters and some symbols; otherwise, double quotation marks "value" are required. For example,

{.python key1=value1 key2=value2}

Codebraid adds support for additional keyword arguments. In some cases, multiple keywords can be used for the same option. This is primarily for Pandoc compatibility.

First chunk settings

These are only permitted for the first code chunk in a session (or the first chunk for a language, if a session is not specified and thus the default session is in use).

Execution

Display

Copying

Including external files