leizongmin / js-xss

Sanitize untrusted HTML (to prevent XSS) with a configuration specified by a Whitelist
http://jsxss.com
Other
5.16k stars 633 forks source link

whiteList fails when using slashes to separate tag attributes (PR included) #268

Open hensleysecurity opened 1 year ago

hensleysecurity commented 1 year ago

Let's say you have whitelisted the img tag. The following will not get filtered (good):

<img src="cat.jpg"/>

And neither will this (good):

<img
src="cat.jpg"/>

However, this will get filtered (bad):

<img/src="cat.jpg"/>

The use of / as a separator is supported by browsers so this ought to work. As reported in this article, the following characters may be used to separate attributes in an HTML tag:

The problem seems to be that the regexes in spaceIndex() and parseAttr() do not know about slashes: https://github.com/leizongmin/js-xss/blob/5711a9c5fac93f3f54541a7b4f7c780ba38adac6/lib/util.js#L30 https://github.com/leizongmin/js-xss/blob/c339c1f777f2f9ba34bb26d5ed67ae2eaede7c2a/lib/parser.js#L169-L170

Therefore, getTagName() should return img, but incorrectly returns img/src="cat.jpg" instead (which is obviously not on the whitelist). The attribute parser has the same issue: it comes back with all the attributes in one string separated by /.

The regexes in the code snippets above are doubly redundant, because \n (literal newline) and \t (literal tab) will already get matched by \s (any whitespace character). All of the other whitespace characters in the list above will also get matched by \s.

I can provide a PR that will fix the issue.