leizongmin / js-xss

Sanitize untrusted HTML (to prevent XSS) with a configuration specified by a Whitelist
http://jsxss.com
Other
5.16k stars 633 forks source link

class is wrong separated by attributes in method onTagAttr #245

Closed sh4d0q closed 2 years ago

sh4d0q commented 2 years ago

Hi everyone,

I have some issue, when I tried to check string like this: <span class=\"preference__text--green text--bold\">BP</span>

I got in method onTagAttr in first iteration name tag class value = \"preference__text--green

and second iteration name tag text--bold value ``

Its shouldn't be like that

I expected this only in one iteration as: name tag class value = \"preference__text--green text--bold

I used js-xss in version 1.0.8 and 1.0.10

Please fix

sh4d0q commented 2 years ago

fix for this,

https://github.com/leizongmin/js-xss/pull/246

leizongmin commented 2 years ago

Hi @sh4d0q ,

Firstly, I think add a new jsdom dependency to solve this problem is not a good idea.

Secondly, in Chrome browser, the HTML <span class=\"preference__text--green text--bold\">BP</span> will results a span tag includes two attributes:

So I think you expected the value of attribute class is \"preference__text--green text--bold didn’t match the standard usage.

leizongmin commented 2 years ago

According to the HTML Spec:

Attributes are placed inside the start tag, and consist of a name and a value, separated by an "=" character. The attribute value can remain unquoted if it doesn't contain ASCII whitespace or any of " ' ` = < or >. Otherwise, it has to be quoted using either single or double quotes. The value, along with the "=" character, can be omitted altogether if the value is the empty string.

Unquoted attribute value syntax The attribute name, followed by zero or more ASCII whitespace, followed by a single U+003D EQUALS SIGN character, followed by zero or more ASCII whitespace, followed by the attribute value, which, in addition to the requirements given above for attribute values, must not contain any literal ASCII whitespace, any U+0022 QUOTATION MARK characters ("), U+0027 APOSTROPHE characters ('), U+003D EQUALS SIGN characters (=), U+003C LESS-THAN SIGN characters (<), U+003E GREATER-THAN SIGN characters (>), or U+0060 GRAVE ACCENT characters (`), and must not be the empty string.

In this case, the attribute class has a unquoted value, and the value ends in a whitespace.

sh4d0q commented 2 years ago

Hi @leizongmin It's doesn't matter how class attribute can be set, by " or /", ' or /' for all cases should be the same behaviour. Every DOMParser handled this case correct. Our html content is came from JSON, where are text with html element link, span etc.

leizongmin commented 2 years ago

Hi @sh4d0q Actually I don't clear understand what is the problem. Here is an example code:

const xss = require("./");
const html = `<span class=\\"preference__text--green text--bold\\">BP</span>`;
xss(html, {
  onTagAttr: function (tag, attr, value) {
    console.log("tag=%s, attr=%s, value=%s", tag, attr, value);
  }
});

It will outputs:

tag=span, attr=class, value=\"preference__text--green
tag=span, attr=text--bold, value=

Where does this result fail to meet expectations?

sh4d0q commented 2 years ago

In this I expected only one output tag=span, attr=class, value=\"preference__text--green text--bold\"

And this is a problem

leizongmin commented 2 years ago

Here is parsed result from htmlparser2 https://astexplorer.net/#/gist/5f2a820a071da6ecbe43b51067f09f1a/7f556d9c28ade60ec01e9633cc70d47df06c0540

    {
      "type": "tag",
      "children": [
        {
          "type": "text",
          "data": "BP"
        }
      ],
      "name": "span",
      "attribs": {
        "class": "\\\"preference__text--green",
        "text--bold\\\"": ""
      }
    }

I means, it's two attributes, one is class, another is text--bold\". The output tag=span, attr=class, value="preference__text--green text--bold" is incorrect.

sh4d0q commented 2 years ago

Hi @leizongmin,

ok, but let's try this (without /"): <span class="preference__text--green text--bold">BP</span>

your lib gives the same result, but shouldn't. The same result you have with ' instead of "

this parser also gives the correct result with this above case: https://astexplorer.net/#/gist/5f2a820a071da6ecbe43b51067f09f1a/7e37cceabe896573621e31d49cbc1f42b65311b5

but let's back with the case with \"

try do something, like that in browser window.document.body.innerHTML = "<span class=\"preference__text--green text--bold\">BP</span>"

How browser handle this case?

leizongmin commented 2 years ago

Hi @sh4d0q , I think we need to distinguish between the two HTML:

Case A (it's two attributes, one is class, another is text--bold\"):

<span class=\"preference__text--green text--bold\">BP</span>

Case B (just one attribute names class):

<span class="preference__text--green text--bold">BP</span>

For case A, the xss parser results two attributes class and text--bold(the correct result should be text--bold\"). But that is another problem. For most cases does not affect normal use. I will fix it in the future.

mdk000 commented 2 years ago

Hi @leizongmin, when can we expect version with fixes?

mrwogu commented 2 years ago

@leizongmin any news here?

mdk000 commented 2 years ago

Hi @leizongmin, are you planning to release version with this fix in the near future? or maybe some other big topics is blocking you? Thanks in advance

leizongmin commented 2 years ago

Hi all, I just released a new version xss@1.0.12 with a fix for this issue.