wooorm / markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions
https://docs.rs/markdown/1.0.0-alpha.18/markdown/
MIT License
835 stars 41 forks source link

Enable custom plugins #32

Open lucperkins opened 1 year ago

lucperkins commented 1 year ago

I'm a huge fan of this library, especially the MDX support 🎉, but I'd like to be able to provide my own extensions (such as super-fancy code blocks with all the bells and whistles and support for fun things like Mermaid diagrams). As far as I can tell, there isn't currently a way to do that. If I'm wrong and there is a way to do that, please let me know and I'll close this 😄 But if this isn't currently possible, I'd like to know if you consider that a non-goal for the project. If it is a non-goal, I'm happy to close this and explore other options. But if you're open to that, I'm happy to help.

wooorm commented 1 year ago

Thanks!

Extensions for me (and as this extends micromark/mdast/remark/etc, I’d like to stick with those terms) mean extending the syntax of markdown with custom things. E.g., JSX, math with dollars, etc. Plugins mean transforming the AST.

The first is as far as I understand impossible in Rust. The extensibility needed to support extensions is not there in Rust. None of the other rust markdown parsers that I know support them either.

The second can be supported. Here’s a PR on the MDX repo: https://github.com/wooorm/mdxjs-rs/pull/8. Something like that can be made on top of this (hence its there). Luckily your examples fall in this plugin category.

I want to develop such features based on needs of people that know the Rust ecosystem, instead of baking it in immediately. So I’d like to hear some ideas and have discussions first :)

lucperkins commented 1 year ago

@wooorm Oh yes, I definitely mean plugins built on top of a provided AST 😄 I've updated the issue title to reflect that. Extensions are officially above my pay grade 🤣

I think it'd be nice to be able to specify plugins without too much extra plumbing. Maybe something like this:

let mut options = Options::default();
options.custom_plugins = vec![
    fancy_code_blocks,
    mermaid_diagrams,
];
let result = to_html_with_options("Big long fancy doc...", &options)?;

Or another option, of course, would be to see if #8 lands and I can just use mdxjs directly (or shamelessly copy that code). But in general I'd definitely prefer to be able to provide plugins for vanilla Markdown rather than via MDX.

comrak, for example, provides a system for modifying the AST and then converting the plugin-modified AST to HTML. And that's basically the mechanism I'd love to see in this lib.

wooorm commented 1 year ago

rather than via MDX

MDX also allows the vanilla markdown format. However, it still compiles to a string of JS, not a string of HTML.

that's basically the mechanism I'd love to see in this lib.

This project here currently is positioned a bit lower than that. You can get a string of HTML out directly (no ASTs), or you can get an AST (which you can then do whatever with yourself)

digitalmoksha commented 1 year ago

Extensions for me...mean extending the syntax of markdown with custom things. E.g., JSX, math with dollars, etc.

The first is as far as I understand impossible in Rust. The extensibility needed to support extensions is not there in Rust. None of the other rust markdown parsers that I know support them either.

I am totally new to Rust, so I could very well be wrong. But it seems like https://github.com/rlidwka/markdown-it.rs/tree/master/src/plugins supports adding extensions to the language, as opposed to just modifying the AST.

lucperkins commented 1 year ago

@digitalmoksha Oh wow, this may just be perfect for my use case! Thanks!

chriskrycho commented 1 year ago

@wooorm I haven't played with trying to add this to markdown-rs yet (and don't know if/when I'll have time to), but a design that I find quite powerful is the one exposed by pulldown-cmark. That crate's public API for generating HTML expressly operates against the stream of syntax "events" it emits. The internals of how that works are extremely similar to the way that parser::parse(input, options) works in this library, but in pulldown-cmark, the event stream is exposed in the public API and is the input to its public APIs for generating HTML, and the events carry the data which makes up the node. As a result you can walk that iterator and produce a new collection of iterable Events from it. This is super useful if, for example, you want to do inline syntax highlighting using Syntect. You can do that kind of thing by transforming the stream of Events and then handing off the transformed stream to push_html(output_buffer, events_iterator)!

I recognize that one large different is that this crate's to_html does not operate on its AST: unlike pulldown-cmark, the Event types here are different from the AST. However, the result is that if someone wants to do that kind of thing, they have to (a) materialize the AST, as the normal to_html does not, and then (b) reimplement to_html on top of it. (Alternatively, only operate with the final, already-processed HTML, but that's a much worse path performance-wise than operating on a Markdown AST because it means you have to parse the HTML!)

I totally get why you don't have the to_html() function use the AST: Why pay for the cost of reifying the AST if you can just skip it and render HTML directly? But if you want to use markdown-rs as a library, not having that exposed makes it much less usable.


Caveat to all of this: I like this library's approach so much that I may end up implementing what I want using its mdast and doing just that—it's far more approachable than trying to implement "standard" footnotes in pulldown-cmark, which I have not managed to find a way to do without rewriting huge swaths of the lex-and-parse stuff (…which I really do not want to do)!

wooorm commented 1 year ago

I think the way to go about it is doing the same as what we do in JS:

  1. https://github.com/wooorm/mdxjs-rs/blob/e90ad3d49ba067f043f83c90e0e144c1f0493ae6/src/mdast_util_to_hast.rs#L82
  2. implement hast_util_to_html, like https://github.com/wooorm/mdxjs-rs/blob/e90ad3d49ba067f043f83c90e0e144c1f0493ae6/src/hast_util_to_swc.rs#L81, but instead a copy of https://github.com/syntax-tree/hast-util-to-html/blob/3c9469abbb1ddd576e93387e37434c0f4d1db6ef/lib/index.js#L24

And to have plugins operate on either mdast or on hast!

chriskrycho commented 1 year ago

That definitely seems reasonable as a design, though I will note that it also comes with non-trivial performance overhead (lots of extra copies and allocations for the AST→AST transforms)! It's quite possible that this is already in the "it's more than fast enough" bucket such that it doesn't especially matter, though.

wooorm commented 1 year ago

non-trivial performance overhead

This seems more strongly worded than what I’d think.

I mean, events are also objects, that are mapped. But they are terrible to work with. ASTs are great to work with. There is one mapping that requires copies: markdown AST -> HTML AST. But other than that, AST transforms do not need extra copies?

Having spent 10 years on ASTs for markdown in the JavaScript world, I’m pretty convinced that ASTs and plugins are the way to go about it.

See also https://github.com/wooorm/mdxjs-rs/pull/27