Parser/compiler backend for HTML

jimbaker commented 1 year ago

Building on what has already been done with compiler.py:

Tag names and attribute keys/values can be arbitrary lists of chunks/thunks, much like *args. This allows for writing a h{level} tag and on{event} attribute respectively.
Enable arbitrary interpolation placement for attributes, eg style={style} or {attrs}. These can be mixed together.

This looks something like the following, with x$x substituting such placeholders:

<html>
   <head><title style=x$x x$x onx$x="foox$x">Test $x$</title></head>
   <body>
        <hx$x>Parse me!</hx$x>
    </body>
</html>

Maintain a concrete syntax tree for the AST, much like Python, that is we use HTMLParser.get_pos and HTMLParser.get_starttag_text to track exactly where in the original source tag string this came from.
Currently we only memoize on the interpolation (argnum, formatspec, conv); we probably should also use exprtext as well to simplify the construction of the source map (original tag string to generated code, similar to co_lnotab. Such a source map should allow for detailed error reporting when an exception is raised.

Simplify compilation, specifically with respect to the match. Currently the compiler has an overyly complex scheme for matching tags, attrs, and children, but it cannot support the generalized tags/attrs approach per above. It's doing too much in one big step, so instead use a small step approach, such that each aspect (tags, any attrs, any children) is processed independently.

Orthogonal to the above is support for async templates. Note that as of Python 3.7, it is not required for a function, including a lambda, to be async to wrap an async usage. In particular, this means PEP 530's async comprehensions can be used. Async templates should compatible with ASGISuch async templates would also be similar to Jinja's support - https://jinja.palletsprojects.com/en/3.0.x/api/

There's a similar idea in generator support, which render the template incrementally, which could be used for rending a large page. (This is less useful for SSG, where that becomes the responsibility of the web server itself, of course).

pauleveritt commented 1 year ago

A couple of questions, I'll ask separately. First: could there be a stage perhaps after this where it gets into a JSON-able representation suitable for SQLite? Where interpolation markers could be directly addressed/indexed in a query? And possibly: that representation could be used for rendering a context-bag of data?

pauleveritt commented 1 year ago

These stages you're discussing...could you imagine the latter stages done in a way that would work with Python < 3.13? Meaning: adapt my the current htm.py to be an input source to the rest of the processing. (Maybe without source maps.)

pauleveritt commented 1 year ago

The HTML Streaming part is quite interesting, aside from SSGs. It's useful for "Mostly-Static Site Generators" which use some edge function for assembly. Ryan Carniato has talked a good bit about this. It's a topic that really needs low-level framework bits to help.

jimbaker commented 1 year ago

First: could there be a stage perhaps after this where it gets into a JSON-able representation suitable for SQLite? Where interpolation markers could be directly addressed/indexed in a query? And possibly: that representation could be used for rendering a context-bag of data?

So this is aspect is handled by the runtime. This is Python code (possibly with Rust extensions, etc), that is able to manage the interpolations, and any associated metadata. Such metadata includes, how do I construct this value? What does it depend on? Do it fan out? etc.

Whereas something like VDOMCompiler in the current compiler.py, is simply responsible for building up an object graph (should actually be an expression tree) that corresponds to a given template.

When these are combined, we get the FDOM, which is managed in SQLite.

jimbaker commented 1 year ago

These stages you're discussing...could you imagine the latter stages done in a way that would work with Python < 3.13? Meaning: adapt my the current htm.py to be an input source to the rest of the processing. (Maybe without source maps.)

Maybe. If user code is not actually using tag strings (maybe you just support Markdown source files), you can just provide functions that pass in the args to templates that have already been compiled. (Since that's just Python code, and really limited, maybe just needing 3.7 for async generators for example, it could work).

jimbaker commented 1 year ago

The HTML Streaming part is quite interesting, aside from SSGs. It's useful for "Mostly-Static Site Generators" which use some edge function for assembly. Ryan Carniato has talked a good bit about this. It's a topic that really needs low-level framework bits to help.

I have been thinking about this. In this article, it mentions the importance of dependency analysis. Where this would come up for the nearly SSG case is tracking the IDs of DOM objects; and patching them incrementally. FDOM should do that.

The challenge is to minimize reflows. Client frameworks can help here, so long as the VDOM is used later. I don't have enough sense of what's possible now in the SSR + near SSG case, say with React, but it's a good requirement for the FDOM work.

jimbaker commented 1 year ago

New version of the AST parser with arbitrary interpolation support in https://github.com/pauleveritt/fdom/commit/5d5e90061f7c9549d7068b496cff45d3eee0a44d

Concrete syntax support (matching line/col in source) can be added to Tag, using something similar to co_lnotab. In addition, the AST itself could be readily JSON serializable if using Pydantic instead of dataclass. This suggests that FDOM might be a superset, which both captures the template, dependencies, along with build products like a compiled representation.

This is likely where we divide the work for the PEP and FDOM itself here.

jimbaker commented 1 year ago

I'm in the process of adding a stream-friendly direct-to-HTML compiler from the AST. (This follows the idea that a FDOM representation will be an augmented AST in some fashion. TBD how well that works.)

One nice aspect - structuring this with state of evaluating interpolations, then yielding them in the __iter__ (a generator-based function) makes codegen far simpler, since it's just that - store evaluated interpolations as local vars, then yield strings based on those interpolations/static text.

pauleveritt commented 1 year ago

I've been tinkering around with the PEP. But what I really want to do....quite badly...is re-do my stack atop this. All the way up to a Sphinx theme.

That part with a generator during interpolation evaluation...will that still work if the template AST is serialized to then deserialized from a persistent cache? I know I've asked variations of this and I guess it hasn't sunk in yet.

jimbaker commented 1 year ago

I've been tinkering around with the PEP. But what I really want to do....quite badly...is re-do my stack atop this. All the way up to a Sphinx theme.

We will get there. First, I'd like to see this be as fast as Jinja, which seems attainable based on my knowledge of how it works. Crossing fingers! The reason I think that's possible is simple: Python is really good in emitting strings; and generators have been sped up because of better frame management; https://docs.python.org/3/whatsnew/3.11.html#cheaper-lazy-python-frames

That part with a generator during interpolation evaluation...will that still work if the template AST is serialized to then deserialized from a persistent cache? I know I've asked variations of this and I guess it hasn't sunk in yet.

Yes, this is absolutely doable. The key thing is that we know exactly when we will interpolate, and what it is. Such results can be attached to the FDOM, which provides the scaffolding; and serialized/deserialized to SQLite (or other persistent store).

There's some subtlety. The FDOM pretty much maps 1-to-1 to the template's AST. But that's a template. So when you render a template, you basically need to copy over the AST to the expanded representation of the FDOM. We should be able to add the copying however to the runtime, so it's not necessary to do directly in generated code. Let's see if that works out as expected.

But first, I want to finish up this example so we can add it to tagstr as a fully worked out example of a high-performance HTML template engine.

jimbaker commented 1 year ago

Another thought here, and it came up because I saw the controversy re removing TypeScript annotations in favor of JavaScript, is what is done in Turbo: https://turbo.hotwired.dev/

The last time I really did much in the way of full stack was 2004-06, when there was this tech in Internet Explorer called HTML Components (HTC), which is basically an early, if super slow version of custom elements; and even better XMLHttpRequest, also part of IE but then adopted by other browsers, including its support for setting it to use async set to true. Like so many other people back then, I had independently discovered this capabiliy by trawling the docs for what allow us to build a dynamic, responsive data-driven dashboard; for me, it worked well with Twisted, so I could sling together some HTML in a response/request, then dynamically update the page with that HTML.

So basically the same thing in Turbo, except things work better now nearly 20 years later.

So SSG for the static parts, along with support for dynamic parts favoring Web Components/HTML/SSE, optimized for PWA, and of course if one needs to do client scripting, PyScript can be used... presumably as part of custom elements.

pauleveritt commented 1 year ago

Yes, Turbo is similar to HTMX which has become the rage in the world of Python, especially Django. Turbo and HTMX are different than HTC: they are server-side rendering while HTCs are a client-side technology. HTMX has a companion language called Hyperscript (no kidding) which lets you write some operations that execute in the browser.

You are right, especially the last paragraph. A templating system that took all of these things (HTMX, WebComponents, view transitions API, PWA, SSE) for granted and was build with them in mind would be a lot more powerful, better DX, and better browser performance.

pauleveritt / fdom

Parser/compiler backend for HTML #9