tildeio / simple-html-tokenizer

A lightweight JavaScript library for tokenizing non-`<script>` HTML expected to be found in the `<body>` of a document
MIT License
85 stars 33 forks source link

Throw syntax errors for invalid EndTags #73

Open camerondubas opened 4 years ago

camerondubas commented 4 years ago

This PR is in response to this comment PR https://github.com/glimmerjs/glimmer-vm/pull/982#discussion_r340068648.

Currently Simple HTML Tokenizer allows leading whitespace as well as attributes to be defined in End Tags. The comment linked above suggests that we throw syntax errors in these cases as they are ultimately invalid End Tags

The HTML Spec seems to be a bit inconsistent when it comes to attributes in End Tags, as the 12.1.2.2 End tags section says;

...

  1. After the tag name, there may be one or more ASCII whitespace.
  2. Finally, end tags must be closed by a U+003E GREATER-THAN SIGN character (>).

However 12.2.5.7 End tag open state says to enter the 12.2.5.8 Tag name state when an the first ASCII Alpha character is encountered. The "tag name state" is not specific to End Tags, and allows for entering the "before attribute name state" and the "self-closing start tag state", both of which aren't really valid in End Tags.


This PR assumes that we want to prevent entering these invalid states and adds syntax errors in the following scenarios:

lifeart commented 4 years ago

up

camerondubas commented 4 years ago

Ok, after some fighting with git/eol characters on Windows I've got a updated working version.

This will now throw the error 'closing tag must only contain tagname' whenever a closing tag contains trailing or leading whitespace, which also covers attributes in closing tags since they would need to be prepended with a whitespace character.

camerondubas commented 4 years ago

Bumping this. Hoping to get a review.

I'm happy to close this PR if it isn't relevant anymore. Thanks!

rwjblue commented 4 years ago

Eeck, sorry @camerondubas, will try to review tomorrow :weary:

jfdnc commented 2 years ago

Just noting here that this appears to address https://github.com/emberjs/ember.js/issues/19703 and https://github.com/glimmerjs/glimmer-vm/issues/1309.

Edit: don't know what the state is here since its been sitting around for a bit, but if there are things to address I'm happy to help out.

camerondubas commented 2 years ago

I believe this is still pending a review. @rwjblue any chance this could get looked at again?