lezer-parser / html

An HTML parser for Lezer
MIT License
13 stars 10 forks source link

Align HTML parsing with spec to support Vue Templates and Single File Components #4

Closed bmeurer closed 1 year ago

bmeurer commented 1 year ago

Vue Templates and Single File Components are basically valid HTML, so for the purpose of supporting them in Chrome DevTools, it's enough to ensure that we treat *.vue files as HTML documents and ensure that the HTML parser is sufficiently relaxed. These patches add support for relaxed attribute name parsing, and also adjust the parsing of self-closing tags to better support common components template syntax.

marijnh commented 1 year ago

I do want to keep this parser aligned with how browsers parse stuff, and recognizing <p/> as a self-closing tag is definitely not what a browser does. What kind of template syntax parses HTML like that?

bmeurer commented 1 year ago

Looking at the spec at least for foreign elements the / right before > marks the tag as self-closing. For <p/> it'd be a bit of an outlier indeed, but wouldn't hurt either, IMHO.

marijnh commented 1 year ago

The issue I'm trying to avoid is for the mode to encourage people to write broken HTML by treating things like self-closing elements, which are simply not something that exists in this context, as valid. <script/> is a very common mistake, and at least having odd indentation below it might tip people off that something is wrong.

bmeurer commented 1 year ago

Makes sense. If you want to be 100% correct here, you would need to distinguish HTML element tag names from other names.

The primary case that I'd interested in, is for Web components and framework components, where you do indeed write stuff like

<Header foo="bar" />
<my-awesome-component icon="x" />
marijnh commented 1 year ago

But that is not HTML... it may be inspired by it, but it really isn't the same language.

Maybe we could make this a dialect, and only enable the self-closing end-of-tag token for that dialect?

bmeurer commented 1 year ago

Alright. How do you feel about the attribute token syntax change? Shall I send that as a separate PR and then we figure out the self-closing tags story independently (as it's not really related and mostly an oddity in the AST)?

marijnh commented 1 year ago

Attached patch adds a dialect "selfClosing" that enables this type of self-closing tag. It's not going to be turned on by default in the lang-html package, but you could reconfigure that language object to enable it. Does that work for you?

bmeurer commented 1 year ago

Oh, that's a cool idea! I like it.

bmeurer commented 1 year ago

Thanks! Can you mark a release with these fixes please?

marijnh commented 1 year ago

I've tagged 1.1.0

bmeurer commented 1 year ago

Thanks, here's the follow-up fix for lang-html: https://github.com/codemirror/lang-html/pull/3