salesforce / secure-filters

Anti-XSS Security Filters for EJS and More
BSD 3-Clause "New" or "Revised" License
139 stars 39 forks source link

Properly HTML Encode Characters >= U+10000 #15

Open stash opened 10 years ago

stash commented 10 years ago

JavaScript uses surrogate pairs of UTF-16 characters to represent characters with code points higher than 65535. HTML technically doesn't like FACE WITHOUT MOUTH U+1F636 encoded as ��, but some browsers seem to be tolerant of this. Ideally, our HTML-escaper would convert this to 😶.

NB, surrogate-pairs are currently white-listed by the filter (as of #14)

stash commented 10 years ago

PR #20 deals with this partially by white-listing code-points requiring a surrogate-pair.