mathiasbynens / he

A robust HTML entity encoder/decoder written in JavaScript.
https://mths.be/he
MIT License
3.43k stars 255 forks source link

Edge case: does not decode example string on w3 spec #50

Open youming-lin opened 7 years ago

youming-lin commented 7 years ago

I was testing encode/decode via https://mothereff.in/html-entities while cross-referencing the spec, and I noticed that he is not able to decode certain named references correctly. On the w3 spec page, it lists this example string, I'm &notit; I tell you, which should be parsed into I'm ¬it; I tell you with a parse error. he returns the string un-parsed. It appears that he is not able to parse legacy named references if there are one or more alphanumeric characters after the legacy named reference followed by a semicolon ; character. he parses correctly if the tail of alphanumeric characters ends with a character other than semicolon.

mathiasbynens commented 7 years ago

Good catch! Thanks for the excellent bug report.

RReverser commented 7 years ago

Got bitten by this too, but can't find what would be the way to fix it in he...

David263 commented 5 years ago

Surely this has been fixed by now...

rakend commented 4 years ago

128th character in ASCII table which looks like a small square when printed with this code alert(String.fromCharCode(128)); is not being encoded. While it's next character 129 in ASCII is encoded as .