decode-map-overrides: Filter keys mapping to themselves

mathiasbynens / he

A robust HTML entity encoder/decoder written in JavaScript.

https://mths.be/he

MIT License

3.43k stars 255 forks source link

decode-map-overrides: Filter keys mapping to themselves #13

Closed fb55 closed 11 years ago

fb55 commented 11 years ago

Probably as a simplification, the HTML5 tokenizer spec contains several characters that map to themselves. They aren't required for decoding, so this PR removes them.

mathiasbynens commented 11 years ago

Thanks! Could you please run grunt fetch and amend the commit, adding the modified file in data/?

fb55 commented 11 years ago

gtm?

mathiasbynens commented 11 years ago

In 0fc3556bba9b542a149d6d97a269f961a5dea0ae I’ve added the missing parseError() statement, and some tests. Thanks!

Since we can only remove numbers from data/decode-map-overrides.json as long as they’re still listed in data/invalid-code-points.json (see my inline comment), perhaps it’s easier to perform the optimization you suggested near the bottom of scripts/scrape-spec.js, before the writeJSON calls.

fb55 commented 11 years ago

Nice :)

mathiasbynens commented 11 years ago

There was a bug in my patch, leading to incorrect results. I fixed this in a later commit; see https://github.com/mathiasbynens/he/compare/8bd18e6cdf4071f04c0ed9583f2d96b500db1da3...842c259d3923bf8ddd7a7fe79f77527cb81bbeb7#diff-5.

mathiasbynens commented 11 years ago

Follow-up: I reported @fb55’s findings, and the redundant entries have now been removed from the WHATWG HTML spec.