RFC: customizable HTML renderer

matklad commented 1 year ago

As of now, djot.js provides extensibility via filters -- code to modify djot's own AST. This is I think of limited usability: djot is the source format, not the destination format, so most interesting logic should happen during the output.

For example, if I want to render ![](url){.video} as <video>, rather than <img>, with filters I'd have to introduce a verbatim node, which is rather ugly. An even harder case is when I want to replace some tag while keeping default rendering for children.

I think a nice way to solve that would be to allow customizing HTML rendering itself, by essentially letting the user to supply a bunch of overrides for various nodes.

This PR implements a proof of concept, with the call-site looking like this:

const ast = djot.parse(`
    # Example

    ![](http://example.com/video.mp4){.video}

    ![](http://example.com/cat.gif)
`);

const html = djot.renderHTML(ast, {
    overrides: {
        image: (node, renderer) => {
            console.log(node.attributes)
            if ((node.attributes?.["class"] ?? "").indexOf("video") != -1) {
                renderer.renderTag("video", node, { src: node.destination })
                return
            }
            renderer.renderAstNodeDefault(node)
        }
    }
})

jgm commented 1 year ago

Seems like a good idea to me.

By the way, in cmark we handled the "change tag while keeping default rendering of the children" by making giving 'enter' and 'exit' fields to the CMARK_CUSTOM_BLOCK and CMARK_CUSTOM_INLINE nodes. So, in a filter you could add the new open tag to 'enter' and the close tag to 'exit'.

matklad commented 1 year ago

Yeah, I am actually now having second thoughts here...

At least for HTML, it seems like it is possible to more-or-less embed full HTML into djot. Basically, if we add just a single condition to the renderer like "a div with a tag attribtue gets rendered as that tag", than whatevere we can do with the overrides I am suggesting here, we can also do as a filter. We can actually replace our whole HTMLRenderer with a filter which produces a bunch of tagged divs, and then have the actual HTML renderer be a trivial transformation.

matklad commented 1 year ago

I think I am still 80% sure that this is a good idea, pushed something which should be good to go!

matklad commented 1 year ago

To state the obvious: this makes HTMLRenderer a part of the public API. I think we are not at the stage where we worry too much about stability of interfaces, so I don't think this is a too big of a problem.

jgm commented 1 year ago

makes HTMLRenderer a part of the public API

I'm kind of a novice at typescript. Is there a way to designate some methods in the class as public and others as public? That could be a good idea if this becomes part of the API.

matklad commented 1 year ago

Yeah, there's private keyword

EDIT: and stuff is public by default.

jgm commented 1 year ago

I think your Div idea above is probably too special-case. But if we wanted to pursue something like that, we could change the RawBlock and RawInline types as follows:

{ tag: "RawBlock",
  format: string,
  enter: string,
  exit: string,
  children: Block[] }
{ tag: "RawInline",
  format: string,
  enter: string,
  exit: string,
  children: Inline[] }

The renderer would (if format matches) first print the enter verbatim, then render children as usual, then print exit. Simple leaf nodes could simply use one of enter/exit and make children []. That's what we did in cmark.

matklad commented 1 year ago

Hm, would it actually be more general than what we have today? I think the above can be simulated with one raw node for enter, one raw node for exit, and the children being siblings between the two bracketing raw nodes?

matklad commented 1 year ago

Tried to use the API in https://github.com/matklad/matklad.github.io/commit/89651fd4f2e08d300deb8129f3ce8b7571fed7a9?diff=split#diff-3de00bf2df40cd18ccb9a5c819912060859e62b1b2450dd6fa50224afd6e9b35 for my blog.

It definitely gets the job done, but HTMLRenderer API feels a bit to verbose as is for the task. In particular, I think as of now the net line diff for the relevant file is 0, and that is despite the fact that before I specified everything, and now only the overriden bit. The html`<div> arbitrary ${interpolated} markup here</div>` device I was using before does seem to add to succinctness.

jgm commented 1 year ago

Hm, would it actually be more general than what we have today? I think the above can be simulated with one raw node for enter, one raw node for exit, and the children being siblings between the two bracketing raw nodes?

Well, it's more convenient, because it's actually hard to insert multiple nodes with the current filter interface. (That should probably change, though.)

jgm commented 1 year ago

So at this point you don't think this should be merged?

matklad commented 1 year ago

I lean towards merging, and thinking in background about how we can make this better.

Without something like this feature, I’d have to write all html myself, but I would like to re-use djot defaults as a matter of principle.

matklad commented 1 year ago

Ok, so I am recalling an important fact: generating HTML is actually something JS/TS people tend to do fairly frequently, and they have the whole JSX thing to do that even:

https://www.typescriptlang.org/docs/handbook/jsx.html

I wonder if this might be the right approach here...

jgm commented 1 year ago

Are you suggesting we should use jsx in the html module instead of just generating the HTML tags programatically? What would be the motivation?

matklad commented 1 year ago

Are you suggesting we should use jsx in the html module instead of just generating the HTML tags programatically? What would be the motivation?

Not yet suggesting, but yeah, thinking about it :)

Basically, I see the whole "supply user-defined template to override rendering of specific elements" as a super-important feature. Couple of years ago I used that approach for creating slide decks out of asciidoctor (source), and since then believe that that's the best UX for customizing light markup.

As I think this might end up being one of the primary interfaces for people to consume djot, I want too make sure that specifying custom tempplates this way reads nice. Usually I don't really care how few characters to type and such, but in this case I do.

And something like

const visitor = {
    image(image: Image, ctx): Node {
        if (image.attributes?.["class"]?.includes("video")) {
            return <video src={image.destination}></video>
        }
        return ctx.default(image)
    }
}

Does look significantly more appealing than either manual string concatenation or the html`` device I used.

So yeah, that's my primary motivation here -- to make the UX for consumers really slick. Though, we might actually end up with significantly more readale html.tsx as well.

jgm commented 1 year ago

Does look significantly more appealing

Agreed. What would making this work require?

matklad commented 1 year ago

Still looking into this: first time using JSX, so no idea really. But it seems that fundamentally that's just a syncatict transformation, so you don't have to import react and such, and can just define your own functions for constructing nodes.

Eg, the following self-contained thing seems to work for me (I am using deno, rather than node, so some specific spells might be different)

https://gist.github.com/matklad/a84c4857782a66589c6c0f6ec0f8af99

The big question is whether we can load a jsx template from a file and eval it: JSX is a transpilation-level thing. I think that'd actually work with deno (as that I think can load TypesScript and JSX directly), but it's also important to make it work with node. While deno to node is what Djot is to Markdown in terms of overall soundless, the same relation holds for the amount of usage today as well (up to including my blog being powered by two less conventional choices, lol).

matklad commented 1 year ago

Ok, so I think for cases where the user is using djot as a library from type script, this'll basically just work, as TypeScript supports JSX syntax natively.

For using djot as a CLI utility which loads template.js file and evals it, I think we can make it work by including a JS parser. https://github.com/acornjs/acorn-jsx seem like a dependable thing (by https://github.com/marijnh of CodeMirror fame). Shipping JS parser is kinda big thing

An interesting case is using djot.js as a library from JavaScript (eg, some random script tag on a web page). In this case, the user doesn't have transpilation pipeline, and we don't want to eval random stuff. I've looked at what (p)react suggests to do here, and the answer is basically "html`` device, but actually well-written": https://preactjs.com/guide/v10/getting-started#alternatives-to-jsx. Which actually doesn't look to bad.

So, yeah, I think it makes a lot of sense to make the customization interface to be based on JSX APIs, and to provide JSX (transpilationnn) and htm (runtime) interface to it.

A nice side benefit would be that we'll render djot elements to a real HTML tree, which would allow post-processing. This is something I actually needed already for my blog -- I want to extract HTML of the first paragraph as a summary (to put that into RSS) and with the current "appending to string" approach that can't be done cleanly.

jgm commented 1 year ago

OK, I think I understand how this works now.

I think we can separate out two issues:

Whether we should have an intermediate HTML-isomorphic AST between the djot AST and the rendered HTML string. As you note, that can have advantages; it becomes possible to go into this structure and change things. On the other hand, there is also a disadvantage: we have to allocate more memory, which might be expected to affect performance. The customization needs could also be taken care of with an override approach like the one in this PR. Another disadvantage is the need for more code and perhaps additional dependencies. On the other hand, we might gain some security from the sorts of mistakes you can make by putting HTML together by hand.

[EDIT:] Is another advantage that we could write a direct renderer to DOM for browser use? (I don't know how much overhead there is to browsers parsing HTML strings, so perhaps this would not be a significant advantage?)
If the answer to 1 is yes, then should we set things up so that JSX syntax can compile to this HTML-isomorphic AST? You're right that typescript can give us this basically for free, at least for typescript users. I'm less sure about the other uses. I'd really like to keep dependencies to a minimum.

I distinguish these issues because in principle we could do 1 without 2. But obviously 2 makes sense only if 1 does.

matklad commented 1 year ago

I think technically we might also do 2 without 1: as far as understand, there are no requirements for jsx to maintain tree shape, it can return a string directly. Though, because each individual DOM node have to evaluate to something, I think we still won’t be able to fully amortize allocations.

OTOH, constructing a virtual DOM tree is the use-case JS was heavily optimized for, so I am not even sure if the impact of extra allocations would be that big.

Is another advantage that we could write a direct renderer to DOM for browser use?

Super fuzzy on this, but I recall hearing that it’s the opposite: asking the browser to construct a DOM tree by parsing a string is faster than constructing it manually object-by-object, which makes some sense. In object-by-object case, we continuously cross the border between JS and C++, the parsing case is just C++.

matklad commented 1 year ago

Another considiration is our old discussion of stack overflows. The "natural" API for exensibility where a user-supplied callback returns an HTML-DOM node for a given djot node would be prone to stack overflow.

OTOH, the current style I think would not be too horrible to rewrite in non-recursive way.

But exposing non-recursive rendering to the user is going to be hard! We can do that with filters, because we translate Djot -> Djot, so we essentially can sneak in results from "recursive" calls in the same data structure. Here, we do Djot -> HTML, so a non-recursive callback would have to accept type which is Djot on the outised, but HTML inside, which I don't think we can conveniently express. (which is another benefit of "embedd HTML DOM into djot" approach, which gets to re-use non-recursive filters)

matklad commented 1 year ago

Ah, I see that the filter's implementation is actually recursive: https://github.com/jgm/djot.js/blob/e1138eddcf5a741d2dcfec069c929644aa0fa167/src/filter.ts#L122-L124. It seems like it can be made non-recursive though (or we can impose depth limit in the parser :) )

matklad commented 1 year ago

I guess in addition to 1 and 2 it makes sense to ponder null hypothesis: rendering each djot node to HTML string. That would be the minimal API and, given that JS strings internally are already string[], might not be as slow as it would seem.

jgm commented 1 year ago

I'm a bit lost about the "null hypothesis" -- do you mean what the current code is doing? (It is pretty much just rendering djot nodes to strings, which it puts on a buffer to be concatenated later.)

Good point about the natural interface being recursive. This could be a reason just to stick with filters for this kind of customization. (I've added an issue to make the filter logic non-recursive.)

Perhaps we could think about how to make this common case for constructing a filter (just to change the default HTML rendering) more ergonomic. What if you could do something like:

filters: [
  { image: node => if (isVideo(node.destination) {
        return <video src={node.destination}><children/></video>
      } }
]

The desugaring process could split this into an "enter" part {before <children/>) and and "exit" part (after that), and construct a RawInline "html" on the above-mentioned plan, with enter and exit parts.

[EDIT: I guess if we wanted to use jsx, we have to stick with what can be done with createElement. But this still ought to be possible. createElement could be set up to create an object, as in your example, and then we could simply split the list of children on the <children/> node.]

matklad commented 1 year ago

do you mean what the current code is doing?

Not exactly. The current code is

class HTMLRenderer {
  buffer: string[];
  renderAstNode(node: AstNode): void { }
}

the alternative I am talking about is

class HTMLRenderer {
  renderAstNode(node: AstNode): string { }
}

this allows the user to render a child node first, and then paste it into some larger context.

Perhaps we could think about how to make this common case for constructing a filter (just to change the default HTML rendering) more ergonomic.

Uhu. I think the only thing I really don't like at the moment is stuffing html into raw tags. Some structure is getting lost that way, and in general that feels super hacky. It also would be convenient if I could make use of djot's code to render attrbibutes and such for my custom node. Basically, I really want some ability to just override the html tag somehow.

Another idea here: what if we allow the filter to set html_tag field on an arbitrary node? This won't be a part of djot AST, that would be a thing specific to HTML converter.

jgm commented 1 year ago

I see. We'd have to measure performance effects of dropping the buffer and just having renderAstNode put out a string. (renderChildren could still use a buffer internally, returning a concatenated string at the end, so maybe it wouldn't be too bad.)

jgm commented 1 year ago

I don't really like the idea of allowing you to set html_tag on the node itself. Nor does it seem sufficiently general; maybe when you switch from img to video, you'll also need some attributes you didn't have before.

matklad commented 1 year ago

I don't really like the idea of allowing you to set html_tag on the node

Uhu

Nor does it seem sufficiently general; maybe when you switch from img to video, you'll also need some attributes you didn't have before.

This one I'd disagree this: in this case, you switch tag from "image" to "div", set html_tag to "video", and set src and any extra attributes you need manually into the attributes map.

jgm commented 1 year ago

Yes, it's true you could add new attributes to the AST. But this approach still seems far too special-purpose. It's tailor-made for this one case -- rendering an image node as a video -- but I don't think most customizations and extensions are going to have this form. And in this particular case, we should probably just build support for videos into the default HTML renderer.

I'm not sure what the right approach is here, but a version of this PR + exploring the "null hypothesis" (if the performance impact is acceptable) still seems best to me.

matklad commented 1 year ago

https://github.com/jgm/djot.js/pull/14 if my eyes are not deceiving me, just rendering to string seems to be faster. Could you double-check the finding? If it replicates, this makes "just return a string" API significatnly more attractive.

matklad commented 1 year ago

Updated in light of #14. I think I now like this, but let me check how it fairs for my blog use-case

matklad commented 1 year ago

Will come back tomorrow, but I think we can do even better here: instead of (node: T, context: HTMLRenderer): string type it seems we can use (node: T): string | undefined with the semantics that undefined is a (dynamic) fallback to the default rendering. I think default rendering is pretty much the only fundamental reason why we need to pass the HTMLRenderer in, and | undefined solves that. But that's for tomorrow me to iron out!

jgm commented 1 year ago

I like that suggestion.

matklad commented 1 year ago

Yeah, sadly I think it doesn’t work, as we still have pass some sort of a callback to render children.

matklad commented 1 year ago

Ok, the new API seems to work quite OK!

https://github.com/matklad/matklad.github.io/blob/3664bdd7c18f5639a09dfc42805e0bba4d376bfc/djot.ts#L30-L202

I still want to try the html_tag thing, I am still somewhat convinced that it's a general, and also quite nice, solution.

matklad commented 1 year ago

I still want to try the html_tag thing,

Yeah, that turns out to be uglier: coding html in djot is just too indirect, it's too hard to tell from the filter's code what the end end result would be.

jgm commented 1 year ago

I like it. Ready to merge, you think?

matklad commented 1 year ago

Yes, this is ready!

jgm commented 1 year ago

I made one small change in a subsequent commit.

jgm commented 1 year ago

@matklad if you'd be able to replace "TODO document HTMLRenderOptions and overrides" in the README with a small, self-contained example of using the overrides feature, that would be great.

jgm commented 1 year ago

I added a really simple example, but if you think more would be good, feel free.

matklad commented 1 year ago

I’ve realized that if we do https://github.com/jgm/djot/issues/146 (dedicated support for role attribute), we’d get “embed html in djot” for free (presumably, html renderer would interpret role exactly as a tag name).

jgm / djot.js

RFC: customizable HTML renderer #4