[Discussion] Plug-ins - Githubissues

azerupi commented 8 years ago

Non-core functionality should be isolated into plug-ins. Plug-ins can then be enabled / disabled depending on the users need. e.g. syntax highlighting could be a plug-in

Plug-ins should:

be able to alter the markdown before rendering
be able to alter the outputted format after rendering
have access to the important configuration options
have a configuration of their own
be able to specify which renderers they target
???

How could we design this?

dtolnay commented 7 years ago

Other abilities I would expect from plugins:

serve local javascript (a search plugin)
insert remote javascript (MathJax or Google Analytics)
add rules to book.css

azerupi commented 7 years ago

That is definitely something I would like too! And I have thought about it already. But I am not sure how I can make the plugin interface generic over multiple renderers (html, pdf, ...) and still be able to allow injecting arbitrary dependencies.

If you have any idea on how we could make this work, I am open for suggestions.

Also, I think it would be nice if plugins could be developed by third-parties. But since Rusts ABI is not stable that would probably require going through a C api..

Many open questions about this design

Evrey commented 7 years ago

Plug-ins should:

LLVM compilers should...

be able to alter the markdown before rendering

be able to optimise the IR before generating machine code.
be able to alter the outputted format after rendering

be able to optimise machine code. (Why?)
have access to the important configuration options

Just give each plugin a read-only map of configuration options or a JSON value or whatever. Read-only as the order in which plugins would run is unclear.
have a configuration of their own

Also pass a private read-write configuration in addition to the previous one. From there on it would be easy to build a single configuration object where plugins could apply local changes to the global configuration.
be able to specify which renderers they target

check --target=triple. Back-end plugins might just have an additional name identifying what they generate, e.g. my_plugin.target() == "HTML" or my_plugin.target() == MdBookTargets::HTML, depending on how extensible MdBook will be. An MdBook plugin might have multiple backends available.
???

!!!

So, yeah, to me, this list sounds a lot like basic compiler design. And with basic compiler design comes this workflow: Tokenise inputs, generate some sort of IR, modify IR, spit code out. Now, translated to MdBook, this means:

Generate Markdown IR, e.g.:

MdToken::H1[ MdToken::Text("Why "), MdToken::InlineCode("mdbook"), MdToken::Text(" ?") ],
MdToken::NewLine,
MdToken::Quote( MdToken::Text("Because it is awesome!") ),
MdToken::NewLine,
MdToken::Text("This person is right!"),

Which might be a way to represent this:

# Why `mdbook`?

> Because it is awesome!

This person is right!

Perhaps MdBook already does this, didn't read through all that code.

Modify IR. Now, front-end plugins should be able to read and modify this IR. They should also be able to provide "custom tokens", like MdToken::Custom(box my_stuff). Why? Think about my [toc]-request. How would I build such a thing as a plugin?
1. Scan each MdToken::Text for the substring [toc] and split it there.
2. Instead of creating a text token "[toc]", generate a MdToken::Custom(Box<MdTokenExt>), where MdTokenExt is some trait telling MdBook how to compile it.
3. Scan through the above MdToken::Text's scope until the next H1 and store references to all H2 up to H6.
4. If MdToken has no way of inserting HTML meta data, prepend each memorised H2 to H6 with a special anchor token. Otherwise, give them a custom anchor or read their already existing anchor names.
5. Use this gained information to generate the actual HTML TOC if MdBook generates HTML.
Spit code out is the step where each token generates its own HTML and where back-end plugins do their thing using e.g. MdTokenExt. Now, you could either generate the HTML document directly, or create an extra HTML IR. I'd recommend generating HTML directly, unless you really-really need to modify HTML as there is no way to insert and modify HTML meta data using the MdBook IR. A back-end example might be a "new tab sanitiser" that adds rel="noopener noreferrer" or whatever it was to all target="_blank" links.

Now, about supporting multiple targets, like generating PDF, HTML, Markdown, or PNG,...

Plugins should be able to check the current targets and should notify MdBook about what to do if a target is not supported. Should the plugin fail or should it ignore unsupported targets? Should a failed plugin be ignored and skipped or should MdBook fail as well? Those are questions one has to answer. Or just make the exact behaviour configurable and pick a sensible default behaviour (e.g. fail, ignore).

Now, why multiple targets? If one wants to generate PDF and HTML at the same time, you can actually share the Markdown IR with all its extensions, saving parsing and modification time. Now, to share extensions, plugins are either required to not fail the target check, or MdTokenExt should take a target identifier parameter to dynamically choose the correct implementation.

azerupi commented 7 years ago

Thanks for that great reply! :blush:

Generate Markdown IR [...] Perhaps MdBook already does this, didn't read through all that code.

Not mdBook, but I use pulldown-cmark to parse the markdown and it generates an iterator over events.

Parsing before sending to the plugin makes some kind of plugins more difficult though. For example a MathJax plugin would allow you to insert math into your book like this:

# Some chapter

A paragraph with some text

$$\int x \; dx$$

Another paragraph

If we parse this to markdown first, backslashes disappear, * collide with italic text etc. Ideally we would want the plug-in to modify this so that the parsing doesn't mess up the math or replace it with a placeholder and put it back in later.

Back-end plugins might just have an additional name identifying what they generate, e.g. my_plugin.target() == "HTML"

Yes, I was thinking along the same line. I thought I would give every plugin and renderer a unique identifier so that plugins could tell what renderers they support.

Evrey commented 7 years ago

Hmm... you don't actually need plugins to be able to read the raw markdown text to support stuff like MathJax. Just by looking at your example code, I'd guess that $$ stuff $$ is MathJax "stuff" delimited by two $$ tokens? Okay, then, together with my [toc] example, we'd be able to extend the plugin design like this:

In addition to plugins modifying Markdown IR or pulldown-cmark events, plugins could specify those three special things:

A special substring match, e.g. [toc] and [TOC].
A single-line special thingy, e.g. § blubb blubb blubb where a line starting with a § character would be handled by a plugin.
A multi-line special thingy, where plugins define the delimiters like $$ (left) and $$ (right) to embed MathJax.

I'm not sure, however, how this would be done using pulldown-cmark. How do you handle MathJax over there? I guess you don't, which might be the reason you suggested source code access for plugins.

How would it work...

Okay, so a MathJax plugin would just catch everything delimited by $$, generate HTML immediately, and make MdBook insert that HTML after the pre-processing step. As long as all MathJax needs to know is between those $$, this is pretty straight forward.

A TOC plugin might use the caught substring [toc] to just memorise its position and to set up "look out for h2...h6 from there" mode. Then, in the front-end phase, the plugin would have to take those H2 to H6, make them inline HTML headings with anchors, and memorise those anchors and their levels and order. Now, I can't replace the [toc] substring here, because I didn't know the HTML to generate, yet. Therefore, I'd have to either use the back-end to replace the substring, or I'd need a second compiler pass to catch the old text event containing [toc] again.

Both approaches don't need to know the whole source code, which is a good thing. However, as soon as a plugin needs information about the whole document, things get rather complicated, quickly. Nothing impossible, though. Thus, the most difficult thing about implementing MdBook plugins would be to implement a decent MdBook pre-processor. Sadly, pre-processing will break source location information, unless you think of clever tricks to map source positions generated by pulldown-cmark to source positions in the non-pre-processed document. And don't forget that a pre-processor has to know about code blocks and inline code.

An AST would definitely be easier, as a second pass would't be required anymore. However, an AST wouldn't solve the backslash and other escapes issues, unless you can somehow pre-process the document or define custom tokens.

ivanceras commented 7 years ago

Has the plugin system been resolve? I'm interested in collaborative effort to plug in AsciiToSvg tansformer.

azerupi commented 7 years ago

Not yet :) Currently I'm rewriting the parser for the SUMMARY.md file to be able to integrate the changes necessary for multi-language books. Plug-ins seem to be the most trickiest part, so I will probably tackle that last.

In the mean time, I'm open to discussion and ideas. It would be interesting to figure out what people need access to to write useful plugins.

ivanceras commented 7 years ago

The plugin I need might be a markdown specific plugin, the google/pulldown-cmark project issue about this is still open. The last activity of pulldown-cmark is 4 months.

Michael-F-Bryan commented 7 years ago

Following on from https://github.com/azerupi/mdBook/issues/268#issuecomment-301700191, it sounds like you want to be able to define a very explicit pipelining process and allow your users to insert plugins along the way.

Somewhat copying @Evrey's idea in https://github.com/azerupi/mdBook/issues/163#issuecomment-243076345 and relating it to a compiler you'd probably want a pipeline somewhat like this:

Run Pre-processing plugins on the input files (e.g. find MathJax expressions and substitute them with whatever mathjax does)
Parse into MarkDown IR/book AST
Run post-parsing plugins to manipulate the AST (e.g. generating a TOC)
Render the book to files on disk
Run post-rendering plugins on the contents of the book directory

An idea I had was that a plugin can any executable on your PATH which accepts input from stdin, does whatever manipulations it needs to do, then dumps the output to stdout. So for example, I could write a TOC plugin which will import the mdbook crate (for serializing to/from MDBook), walk the list of BookItems, adding a short table of contents to any pages with a [toc] placeholder, then dump the AST back to stdout as JSON for the next step in the pipeline.

Then as far as configuration goes, there can be keys for each pluggable step of the pipeline in your config file and you just add the name of your plugin's executable to that.

Michael-F-Bryan commented 7 years ago

@azerupi, I would be keen to help out with a plugin system. Let me know if there's anything I can do or how you want to implement it, and I'd be more than happy to get a start on this.

acheronfail commented 4 years ago

Just adding a potential use-case:

I added some functionality where I wanted to inject some dynamic HTML into the page, so I ended up created a pre-processor which injected a <script> tag if a certain string is found.

This script waits for the page to be loaded, and then injects some custom behaviour and registers some event listeners.

So, I guess what would have been really nice is a "post-processor" of some sort? Just wanted to throw this out there while we're discussing a Plugin API. :slightly_smiling_face:

Michael-F-Bryan commented 4 years ago

@acheronfail, could you use the output.html.additional-js array in book.toml to inject a piece of javascript which will inject HTML into the page when loaded?

~~I'm surprised this works to be honest. The mdbook crate pulls in the HTML sanitiser, ammonia, so I thought it would remove script tags and the like from the book when the HTML renderer runs.~~

Edit: ignore my comment. It looks like ammonia is only used by the search function to sanitise the previews you see when searching.

acheronfail commented 4 years ago

@acheronfail, could you use the output.html.additional-js array in book.toml to inject a piece of javascript which will inject HTML into the page when loaded?

That's probably a better option. Although, the benefit I have right now is that my preprocessor checks if the renderer is html, and then injects the scripts/etc. If it's not html, then it just renders a fallback value for the templates in the source...

I guess I'd need both a pre-processor, and an entry in the additional-js file:

the pre-processor overwrites the templates in the source if the renderer is not html
the additional js looks for those templates and does its work when the html renderer is used

acheronfail commented 4 years ago

~~To give a clearer context on what I'm doing, I have a file which contains a list of templates:~~

[
  {
    "template": "__HOST__",
    "fallback": "10.10.14.1"
  },
  {
    "template": "__PORT__",
    "fallback": "1234"
  }
]

~~When these templates are found (and the renderer is HTML), they are replaced with <span class="dynamic-template">${fallback}</span>. So, when a hotkey is pressed a window pops up with a list of templates to inputs: the user can place a custom value there. The templates on each page are then updated with the custom value.~~

The use case of this, is consider you have code blocks with commands that you'd like to copy, but you don't want to have to replace the IP address (or anything else) with the one you need each time you copy it, so you put in a template, and set the value when viewing the book in the browser. Now, all instances of __HOST__ (or whatever template you had) are replaced with the value.

~~In the case of a non-HTML renderer, the template is replaced just with the fallback.~~

EDIT: I've created a repository for this here: https://github.com/acheronfail/mdbook-dynamic-templates if anyone is interested.

XVilka commented 4 years ago

Would be nice to have the option to run tests on code snippets that are not in Rust.

Michael-F-Bryan commented 4 years ago

the benefit I have right now is that my preprocessor checks if the renderer is html,

@acheronfail, did you know you can tell your preprocessor to only run against certain renderers? See the renderer = ["html", "epub"] line in the preprocessor's config in the Preprocessor chapter.

Michael-F-Bryan commented 4 years ago

Would be nice to have the option to run tests on code snippets that are not in Rust.

@XVilka, what did you have in mind? mdbook uses Rust's built-in test harness, but if you're compiling non-Rust code how would you know to test it?

I feel like you could create your own renderer for this though. It's just like mdbook-linkcheck in that the "rendered" product isn't a document on disk, but a set of diagnostics that are shown to the screen and an exit code that's passed to mdbook to indicate failure.

alexander-myltsev commented 8 months ago

hi. I'm new to Rust. let me know if there's any better place to ask this question.

I'd like to debug, say, katex plugin in an Intellij IDE. How should I run the mdbook command with plugged-in mdbook-katex on a real book so that the IDE might catch the breakpoint?

Also, does Rust provide symbol information enough to trace back to mdbook source code? If not, then how to arrange multiple projects environment without minimal changes to both mdbook and mdbook-katex viewable by the IDE?

rust-lang / mdBook

[Discussion] Plug-ins #163