Next stage in the plugin architecture

We merged a deliberately minimal Preprocessor phase in #532, so it might be time to start thinking of how the concept of a preprocessor can be extended.

After looking through @Byron's example preprocessor (#629) it's pretty obvious that the API for accessing a book's contents less than ideal, and making the Book API more ergonomic will probably go a long way in allowing third parties to create their own plugins.

This is more of a discussion thread so we can throw around ideas and opinions. @Byron, @sorin-davidoi, @azerupi and anyone else who's interested, feel free to chime in with ideas or suggestions :grin:

To get the ball rolling I think a couple things we could do are:

Expose the sections item inside a Book so people can manually walk through a Book's contents
Allow people to provide their own preprocessor programs, similar to how alternate backends work
Possibly consider going from a tree-based structure for representing book chapters to a graph-based structure
- The idea is a nested chapter can have links back to its parent (and maybe siblings?), making things like section numbers and breadcrumbs a lot easier to implement.
- Parsing the SUMMARY.md file may become easier because we can avoid the ugly recursion needed to append items to a chapter.
- To deal with ownership we'll probably want to use something like petgraph instead of rolling ourselves
- Ensuring the entire thing is consistent after doing mutations may be interesting.
- This breaks all current alternate backends (due to different JSON representations) and will probably make serializing/deserializing more tricky

Thanks @Michael-F-Bryan for getting started with this, and for involving us :)!

For me the top-priority would be to allow preprocessors to be standalone programs, using a plugin mechanism similar to what renderers can already do. Otherwise, people would have to provide their own clones of basic mdbook functionality, like termbook did, just because there is no other way to get a preprocessor in.

Right after that, and to my mind a very low hanging delicious fruit, is to allow iterate section more comfortably, while supporting 'standard' error handling. This should read as 'not what this preprocessor example' has to do.

And here is where I would already make the cut, just because this would already solve all the problems I had while working on termbook.

Besides that, I do like the simplicity of it, which also means I like the tree chapters form, reflecting clearly the SUMMARY.md they are coming from. I can imagine that certain preprocessors require more, and maybe a petgraph powered data structure would just be their thing. For renderers, I think it would suffice if it was easy to convert the current tree into a graph, for example, and they can use that to their hearts content to generate their output. For mutable access with a graph, as possibly required by preprocessors, I would probably want to postpone to another round, until there is demand, just because the whole thing gets a lot more complicated then. (I am assuming here that the current tree structure is kept as foundation in any case).

My general driving forces are

keep it simple
wait until there is actual demand with a clear usecase
let's not get ahead of ourselves (keep it simple, phrased differently :D)

I'm going to copy across the conversation from https://github.com/rust-lang-nursery/mdBook/pull/626#issuecomment-370328393. It's quite relevant to this issue.

@Michael-F-Bryan https://github.com/rust-lang-nursery/mdBook/pull/626#issuecomment-370337584

When the work for this issue is done, would you prefer a squashed commit? Or do you want to keep the history?

I usually merge PRs with the "squash and merge" button because it leads to a much cleaner git history. GitHub will do the squashing for us, so you shouldn't need to do anything.

Mathjax is only relevant for the HTML renderer and a mathjax_support option can be found in the HtmlConfig.

This is one of the things I was referring to in https://github.com/rust-lang-nursery/mdBook/pull/629#issuecomment-368221654 when I said:

... I'm stumped on a couple things, for instance some preprocessors will only be relevant when combined with a specific backend ...

I was thinking a user would add the mathjax preprocessor in their book.toml (or we'd add it as part of the default list if the user doesn't specify any preprocessors) and then for each backend we'll run the entire preprocessor pipeline independently. You can then tell each preprocessor which renderer it's being run for (the Renderer trait requires a name() method), and it then becomes the preprocessor's responsibility to change its behaviour (e.g. run or return early) based on that.

Or expressed as code:
// when generating the book

for renderer in &self.renderers {
  let mut book = self.book.clone();
  let name = renderer.name();

  for preprocessor in &self.preprocessors {
    let ctx = /* ... */;
    preprocessor.run(ctx, name, &mut book)?;
  }

  renderer.render(...)?;
}

// in the MathJax preprocessor

const RENDERER_WHITELIST: &[&str] = &["html"];

impl Preprocessor for MathJax {
  fn run(&self, ctx: &PreprocessorContext, renderer: &'static str, book: &mut Book) -> Result<()> {
    if !RENDERER_WHITELIST.contains(renderer) {
      return Ok(());  // there's nothing to do. Bail early.
    }

    /* do usual preprocessing */
  }
}
This is probably going to be the most scalable approach because we aren't using an ad-hoc side-channel for communicating whether the preprocessor should run. It does allow you to couple preprocessors to specific renderers, but in this case I'd say the MathJax preprocessor is already innately coupled to the HTML renderer so it's not necessarily a problem. If a preprocessor doesn't really care which renderer it's being run for (e.g. the current link rewriters), it can always ignore the renderer argument.

@dvberkel sorry for the wall of text! It's a non-trivial problem and I can imagine as we start using preprocessors for more stuff we'll need to find a more generic solution. I'm kinda hesitant to add custom logic to the determine_preprocessor() function (your second point) because that couples it to the HTML renderer and won't work for 3rd party preprocessors.

@dvberkel https://github.com/rust-lang-nursery/mdBook/pull/626#issuecomment-370341238

@Michael-F-Bryan Don't worry about the wall of text. It conveys your point very well. I would rather want to read more, think and do a well-informed job then a rush to a mediocre solution.

Is what you describe already in place? Or is it something that needs to be done?

@Michael-F-Bryan https://github.com/rust-lang-nursery/mdBook/pull/626#issuecomment-370346912

Is what you describe already in place? Or is it something that needs to be done?

It's one of the ideas I've been thinking about, but nothing is implemented yet. I started #631 so we have a place to discuss these sorts of things.

Of itself, introducing this renderer argument to Preprocessor::run() is a fairly trivial change to make. I just want to hear more people's opinions before implementing anything.

For example what do we do if a user wants to say "run the MathJax preprocessor when building an EPUB"? With this system the preprocessor's whitelist would be hard-coded and wouldn't allow that. A preprocessor could add a setting under its table in book.toml (e.g. renderers = ["html", "epub"]), but then we're just swapping one ad-hoc side-channel for another.

Another option is to let mdbook decide whether to run a preprocessor using the previous renderers = ... setting, falling back to some sort of hint from the preprocessor if it's not available.

e.g. the user could manually specify which renderer to run the MathJax preprocessor for:
[preprocessor.mathjax]
renderers = ["html", "epub"]
And if the renderers key isn't provided, we use a fallback, probably by adding a fn hint_renderer_is_supported(&self, render: &str) -> bool method to the Preprocessor trait.

I am curious what the appetite for breaking up the mdbook library into a few smaller crates to allow for easier plugin development?

Currently if you want to build a preprocessor you are required to bring all filesystem portions of mdbook even though your application will only be concerned with reading from stdin, Book and sending that back out via stdout.

It seems silly to have to compile notify, walkdir and tempfile even though my preprocessor is never going to touch the file system.

In my head the overall structure might look something like this

mdbook-core
- would contain the contents of src/book/book.rs except for create_missing and load_book
mdbook-preprocessor
- Would contain Preprocessor, PreprocessorContext and CmdPreprocessor
- Would re-export mdbook-core
mdbook-renderer
- would contain Renderer, RendererContext and CmdRenderer
- Would re-export mdbook-core

By moving these items into their own library crates, you would eliminate the need for these extra dependencies.

Sorry if this isn't the appropriate place to make this comment. I would be happy to participate in a similar conversation elsewhere or open a new issue to discuss this further.

For a blog post I am working I have have set up what I described above as a fork or the current master. You can find it here.

https://github.com/freemasen/mdbook

rust-lang / mdBook

Next stage in the plugin architecture #631