servo / html5ever

High-performance browser-grade HTML5 parser
Other
2.14k stars 222 forks source link

Fix parsing of bogus comments after end tags #507

Closed AcqRel closed 1 year ago

AcqRel commented 1 year ago

This fixes a bug in the tokenizer where the tag name was included in a bogus comment after an appropriate end tag.

For example, this:

<style></style ><!a>

is incorrectly parsed as the following:

<style></style><!--stylea-->

instead of the expected:

<style></style><!--a-->

For this bug to trigger, the end tag needs to be parsed in one of the raw text states (RCDATA, RAWTEXT, or Script data) and have whitespace or a slash after the tag name. I don't know how the contents of the temporary buffer end up inside the comment, but clearing the temporary buffer when exiting the RawEndTagName state seems to be enough to fix it.