microcosm-cc / bluemonday

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS
https://github.com/microcosm-cc/bluemonday
BSD 3-Clause "New" or "Revised" License
3.14k stars 176 forks source link

Sanitizing "Filter bypass based polyglot" #97

Closed micke136 closed 4 years ago

micke136 commented 4 years ago

So I was using UGCPolicy and was running some test to see if it prevented attacks from https://xsses.rocks/sample-page/. I see that the one named "Filter bypass based polyglot" didn't sanitize the script part. What is the reason for it not removing the script part?

buro9 commented 4 years ago

I added a quick set of tests to sanitize_test.go, my hunch is that for the input given it would probably end up escaping it as the input seems malformed (eligible for being HTML escaped) rather than valid but not allowed by policy (eligible for being removed).

The test:

func TestXssesRocks(t *testing.T) {
    // https://xsses.rocks/sample-page/
    p := UGCPolicy()

    tests := []test{
        {
            in:       `'';!--"<XSS>=&{()}`,
            expected: `&#39;&#39;;!--&#34;=&amp;{()}`,
        },
        {
            in:       `<SCRIPT SRC=http://xss.rocks/xss.js></SCRIPT>`,
            expected: ``,
        },
        {
            in: `'">><marquee><img src=x onerror=confirm(1)></marquee>"></plaintext\></|\><plaintext/onmouseover=prompt(1)>
<script>prompt(1)</script>@gmail.com<isindex formaction=javascript:alert(/XSS/) type=submit>'-->"></script>
<script>alert(document.cookie)</script>">
<img/id="confirm&lpar;1)"/alt="/"src="/"onerror=eval(id)>'">
<img src="http://www.shellypalmer.com/wp-content/images/2015/07/hacked-compressor.jpg">`,
            expected: `&#39;&#34;&gt;&gt;<img src="x">&#34;&gt;
&lt;script&gt;prompt(1)&lt;/script&gt;@gmail.com&lt;isindex formaction=javascript:alert(/XSS/) type=submit&gt;&#39;--&gt;&#34;&gt;&lt;/script&gt;
&lt;script&gt;alert(document.cookie)&lt;/script&gt;&#34;&gt;
&lt;img/id=&#34;confirm&amp;lpar;1)&#34;/alt=&#34;/&#34;src=&#34;/&#34;onerror=eval(id)&gt;&#39;&#34;&gt;
&lt;img src=&#34;http://www.shellypalmer.com/wp-content/images/2015/07/hacked-compressor.jpg&#34;&gt;`,
        },
        {
            in:       `<IMG SRC="javascript:alert('XSS');">`,
            expected: ``,
        },
        {
            in:       `<IMG SRC=javascript:alert('XSS')>`,
            expected: ``,
        },
        {
            in:       `<IMG SRC=JaVaScRiPt:alert('XSS')>`,
            expected: ``,
        },
        {
            in:       `<IMG SRC=javascript:alert("XSS")>`,
            expected: ``,
        },
        {
            in:       "<IMG SRC=`javascript:alert(\"RSnake says, 'XSS'\")`>",
            expected: ``,
        },
        {
            in:       `<IMG """><SCRIPT>alert("XSS")</SCRIPT>">`,
            expected: `&#34;&gt;`,
        },
        {
            in:       `<IMG SRC=javascript:alert(String.fromCharCode(88,83,83))>`,
            expected: ``,
        },
    }

    // These tests are run concurrently to enable the race detector to pick up
    // potential issues
    wg := sync.WaitGroup{}
    wg.Add(len(tests))
    for ii, tt := range tests {
        go func(ii int, tt test) {
            out := p.Sanitize(tt.in)
            if out != tt.expected {
                t.Errorf(
                    "test %d failed;\ninput   : %s\noutput  : %s\nexpected: %s",
                    ii,
                    tt.in,
                    out,
                    tt.expected,
                )
            }
            wg.Done()
        }(ii, tt)
    }
    wg.Wait()
}

You can try this out, and yup... it escapes it.

So long as what you do with the output isn't to unescape it, you're good. But this is why the guidance in the readme states that it must be the last thing in a pipeline of processes that process the text.