Open mathiasbynens opened 10 years ago
It would be nice to have an option (default off) that would exclude the characters not allowed in XML even as an entity reference. These characters cause XML validation to fail. For example,
https://github.com/MylesBorins/xml-sanitizer/
he could either strip the invalid characters (like above) or replace them with a non-entity (like ESC for 0x1B).
Another XML vs. HTML issue is how to encode using only named XML entities (i.e. &
, <
, >
, '
and "
). Don't think this is possible in the API today, but perhaps it should be? Just a very minor issue of course.
Ended up with a few extra lines of JS to work-around this:
const ENTITIES = ['&', '"', ''', '<', '>'];
const recode = (s) => he.encode(he.decode(s), { decimal: true });
const fix = (s) => ENTITIES.reduce((s, e) => s.replaceAll(recode(e), e), s);
This may not be worth it, but here goes…
E.g.
…
→ U+0085 in XHTML, while in HTML it’s U+2026.http://www.w3.org/TR/xml/#d0e3895
Entities for these symbols are allowed in XML: http://www.w3.org/TR/xml/#NT-Char