gpoore / codebraid

Live code in Pandoc Markdown
BSD 3-Clause "New" or "Revised" License
367 stars 13 forks source link

Working with multiple source files w/ .cb.nb #45

Open rossbar opened 3 years ago

rossbar commented 3 years ago

I recently experienced an issue with working with multiple source files that would then be combined into one larger document, e.g. multiple files representing book chapters. If the files are set up to run individually with the notebook executor (i.e. .cb.nb) then execution will fail silently when trying to execute and combine the files into a single document.

Minimal reproducing example

Say you have two source files ch1.md and ch2.md that you want to execute+compile into book.pdf:

Contents of ch1.md:

# Ch. 1 - Uniform distribution

A histogram of uniformly-distributed random numbers.

```{.python .cb.nb jupyter_kernel=python3}
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng()
plt.hist(rng.uniform(size=1000))

#### Contents of `ch2.md`

Ch 2. - Normal Distribution

A histogram of normally-distributed random numbers.

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng()
plt.hist(rng.standard_normal(size=1000))

Executing/converting the files individually works as expected:

```bash
$ codebraid pandoc --from markdown --to pdf ch1.md --standalone -o book.pdf

However, if you try to compile both documents into a single book, neither document is executed, though no warning or error are given on the command line:

$ codebraid pandoc --from markdown --to pdf ch1.md ch2.md --standalone -o book.pdf

In the latter case, if you look at the output book.pdf you will find an error printed:

SOURCE ERROR in "ch2.md" near line 6:
Some options are only valid for the first code chunk in a session: "jupyter_kernel"

IMO it would be helpful to the user if this error were raised at the command line rather than (or in addition to being) embedded in the output document. In my actual use-case with much larger chapters, it was a very long time before I noticed this in the output book.

The error in book.pdf seems to suggest that the problem lies with the "special" metadata jupyter_kernel, which is only supposed to be supplied in the first code cell. This suggests that an author would have to modify source file metadata if they wanted to switch between building individual chapters and the entire book. I hadn't noticed this mentioned in the docs before - if it's not there, then it would be an improvement if this behavior were documented.

Perhaps this can be avoided if .cb.run is used instead of .cb.nb? Is there a preferred way of using codebraid to have flexible outputs w/ multiple source files?

gpoore commented 3 years ago

I need to clarify the documentation on this. By default, when you pass Pandoc multiple files, it treats them all as one. Codebraid does the same thing, so the code from multiple files is treated as all being from one file, and thus all being in one session. Hence the error about first code cell config in the wrong place.

Pandoc has a --file-scope option that treats multiple files as individuals, and then merges the results after parsing, This should cause Codebraid to do the same thing. The test files work with --file-scope. Of course, that means that you can't have shared Markdown between files (things like footnote definitions, etc.). I have an existing way to enable the effects of --file-scope for Codebraid even when it is disabled for Pandoc, but just haven't made it available to users yet...let me know if you need that.

In terms of better errors: There's #24 for adding exit codes, and I'm referencing that here to remind myself to look into more extensive error messages on the command line as well.

rossbar commented 3 years ago

Pandoc has a --file-scope option

Thanks, I wasn't aware of this option.

I have an existing way to enable the effects of --file-scope for Codebraid even when it is disabled for Pandoc, but just haven't made it available to users yet...let me know if you need that.

I'm not sure yet if it's necessary - at this stage it seems there's enough flexibility to put together a sensible workflow without this feature, but I'll keep it in mind as I continue experimenting with multiple files.

In terms of better errors: There's #24 for adding exit codes, and I'm referencing that here to remind myself to look into more extensive error messages on the command line as well.

:+1: