There is a sanitization failure scenario impacting cases where the onTag function is used to customise how certain tags are handled. The second parameter to this function is the HTML of the opening tag, complete with attributes, and is susceptible to malformed HTML which means it is not sanitized before being passed to the function. If the onTag customisation function then uses this HTML as part of the output then an XSS is possible.
if ((c === '"' || c === "'")) {
if (html.charAt(currentPos - i) === "=") {
quoteStart = c;
continue;
}
}
The tokenizer goes into the 'inside quoted value' state when it encounters a " or ' character that is immediately preceded by a =.
However if we supply this input:
<a target= " href="><script>alert(2)</script>">
The target attribute has a space after the = character, so the tokenizer fails to enter the 'inside quoted value' state. It then continues parsing until it meets href=" at which point if erroneously enters the 'inside quoted value' state and parses the subsequent <script>tags as beloning to that attribute, until the " following those tags.
This allows smuggling the <script> tags or other malicious content into the html variable that is passed as the second parameter to the custom onTag function. When the custom tag function uses that value (presuming it has been correctly sanitized) the malicious payload is injected back in the output:
This is a fairly minimal reproduction of the failure case:
var xss = require('xss');
inputData = `<a target= " href="><script>alert(1)</script>"><span>`;
var O = {
onTag: function(_, E, S) {
if (S.isWhite && "a" === _) {
if (S.isClosing)
return "</span></a>";
return "".concat(E, '<span>')
}
}
};
var html = xss(inputData, O);
console.log(html);
Patch
My suggested fix (bear in mind I'm not a JS person!) is to add a while loop that allows for spaces between = and the quote starting an attribute value. All the tests pass still. I have a PR prepared that also adds a test case for this scenario, but didn't want to submit the PR publicly until I spoke to you.
An alternative approach would try to keep track of whether the last character, ignoring whitespace was an =. However, when I mapped this out it added a lot of complexity.
Tests
All the tests pass still, but I have also added a new test. Let me know if you think more would be sensible.
There is a sanitization failure scenario impacting cases where the
onTag
function is used to customise how certain tags are handled. The second parameter to this function is the HTML of the opening tag, complete with attributes, and is susceptible to malformed HTML which means it is not sanitized before being passed to the function. If theonTag
customisation function then uses this HTML as part of the output then an XSS is possible.Sanitizer logic error
The error is at lines 88-91 of the parser:
The tokenizer goes into the 'inside quoted value' state when it encounters a
"
or'
character that is immediately preceded by a=
.However if we supply this input:
The
target
attribute has a space after the=
character, so the tokenizer fails to enter the 'inside quoted value' state. It then continues parsing until it meetshref="
at which point if erroneously enters the 'inside quoted value' state and parses the subsequent<script>
tags as beloning to that attribute, until the"
following those tags.This allows smuggling the
<script>
tags or other malicious content into thehtml
variable that is passed as the second parameter to the customonTag
function. When the custom tag function uses that value (presuming it has been correctly sanitized) the malicious payload is injected back in the output:Browsers are robust to the
target= "
snippet including a space, so parse the DOM like so:Minimum reproduction
This is a fairly minimal reproduction of the failure case:
Patch
My suggested fix (bear in mind I'm not a JS person!) is to add a while loop that allows for spaces between
=
and the quote starting an attribute value. All the tests pass still. I have a PR prepared that also adds a test case for this scenario, but didn't want to submit the PR publicly until I spoke to you.An alternative approach would try to keep track of whether the last character, ignoring whitespace was an
=
. However, when I mapped this out it added a lot of complexity.Tests
All the tests pass still, but I have also added a new test. Let me know if you think more would be sensible.