mathiasbynens / he

A robust HTML entity encoder/decoder written in JavaScript.
https://mths.be/he
MIT License
3.45k stars 254 forks source link

Add XHTML/XML option #15

Open mathiasbynens opened 10 years ago

mathiasbynens commented 10 years ago

This may not be worth it, but here goes…

E.g. … → U+0085 in XHTML, while in HTML it’s U+2026.

http://www.w3.org/TR/xml/#d0e3895

Entities for these symbols are allowed in XML: http://www.w3.org/TR/xml/#NT-Char

royfielding commented 3 years ago

It would be nice to have an option (default off) that would exclude the characters not allowed in XML even as an entity reference. These characters cause XML validation to fail. For example,

https://github.com/MylesBorins/xml-sanitizer/

he could either strip the invalid characters (like above) or replace them with a non-entity (like ESC for 0x1B).

cederberg commented 1 year ago

Another XML vs. HTML issue is how to encode using only named XML entities (i.e. &, <, >, ' and "). Don't think this is possible in the API today, but perhaps it should be? Just a very minor issue of course.

Ended up with a few extra lines of JS to work-around this:

    const ENTITIES = ['&', '"', ''', '<', '>'];
    const recode = (s) => he.encode(he.decode(s), { decimal: true });
    const fix = (s) => ENTITIES.reduce((s, e) => s.replaceAll(recode(e), e), s);