microcosm-cc / bluemonday

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS
https://github.com/microcosm-cc/bluemonday
BSD 3-Clause "New" or "Revised" License
3.12k stars 176 forks source link

Double escaping attribute values #143

Closed jtran closed 2 years ago

jtran commented 2 years ago

The values of attributes are getting double escaped. As far as I can tell, the problem was introduced in f0057e2386fef8ce4ee3ede8a56b59d27449d49f. In particular, the double-quote character and non-breaking spaces won't ever make it through sanitizing.

In code, I expect the following test to pass.

func TestQuotesSanitization(t *testing.T) {
    tests := []test{
        {
            in:       `<p title="&quot;"></p>`,
            expected: `<p title="&quot;"></p>`,
        },
        {
            in:       `<p title="&nbsp;"></p>`,
            expected: `<p title="&nbsp;"></p>`,
        },
    }

    p := UGCPolicy()
    p.AllowAttrs("title").OnElements("p")

    // These tests are run concurrently to enable the race detector to pick up
    // potential issues
    wg := sync.WaitGroup{}
    wg.Add(len(tests))
    for ii, tt := range tests {
        go func(ii int, tt test) {
            out := p.Sanitize(tt.in)
            if out != tt.expected {
                t.Errorf(
                    "test %d failed;\ninput   : %s\noutput  : %s\nexpected: %s",
                    ii,
                    tt.in,
                    out,
                    tt.expected,
                )
            }
            wg.Done()
        }(ii, tt)
    }
    wg.Wait()
}

However, I get this output.

=== RUN   TestQuotesSanitization
    sanitize_test.go:3713: test 1 failed;
        input   : <p title="&nbsp;"></p>
        output  : <p title="&amp;nbsp;"></p>
        expected: <p title="&nbsp;"></p>
    sanitize_test.go:3713: test 0 failed;
        input   : <p title="&quot;"></p>
        output  : <p title="&amp;quot;"></p>
        expected: <p title="&quot;"></p>
buro9 commented 2 years ago

Looks like I introduced a bug when attempting to fix HREF santization: https://github.com/microcosm-cc/bluemonday/commit/f0057e2386fef8ce4ee3ede8a56b59d27449d49f

However when I remove the double-sanitizing in that and test the output of the tests I wrote for that code... it doesn't pose a risk.

That is, removing the double escaping still results in sanitized and safe to use output.

As such, I'm going to remove the part of that prior commit that led to the double escaping.

zeripath commented 2 years ago

@buro9 will you be releasing a new version of bluemonday or should we (gitea) just pull a head?


Ah I see you've already released 1.0.19 - Thanks!