lezer-parser / html

An HTML parser for Lezer
MIT License
13 stars 10 forks source link

Align unquoted attribute value syntax with the HTML spec. #6

Closed bmeurer closed 1 year ago

bmeurer commented 1 year ago

According to the HTML spec^1 the unquoted attribute values can contain slashes, and basically only forbid ASCII whitespace, quotation characters, equals, and <>. This fixes the token regex to be aligned with the HTML specification here.

Bug: https://crbug.com/1385661

marijnh commented 1 year ago

Any reason you're using hex codes rather than the more readable and shorter escapes like \t in there?

bmeurer commented 1 year ago

Any reason you're using hex codes rather than the more readable and shorter escapes like \t in there?

Was easiest to not make a mistake when going through the HTML spec. I've updated the patch to a minimal delta.

marijnh commented 1 year ago

Thanks, merged.