microcosm-cc / bluemonday

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS
https://github.com/microcosm-cc/bluemonday
BSD 3-Clause "New" or "Revised" License
3.12k stars 176 forks source link

Parsing Issue handling self closing html tag #117

Closed sjawaji closed 3 years ago

sjawaji commented 3 years ago

Looks there is some issue parsing self closing html tags. Not sure if this is a known issue.

package main

import (
    "fmt"
    "github.com/microcosm-cc/bluemonday"
)

func main() {
    p := bluemonday.NewPolicy()
    p.AllowElements("p")

    htmlPayload := `<html xmlns="http://www.w3.org/1999/xhtml"><head><title/></head><body><p>body</p></body></html>`

    body := p.Sanitize(htmlPayload)
    fmt.Println(body)
}

Result:

&lt;/head&gt;&lt;body&gt;&lt;p&gt;body&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;

If I remove the self closing title <title/> tag from the payload, then it is working fine and output is as expected.

<p>body</p>
buro9 commented 3 years ago

Garbage in, garbage out applies.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title

Both opening and closing tags are required. Note that leaving off should cause the browser to ignore the rest of the page.

That is indeed what happens, the rest of the page is ignored (as HTML) and thus just escaped to make it safe.

Valid HTML is expected, but if not then bluemonday will make it safe to display within a HTML page (from an XSS perspective).

You should either omit the title tag or close it as per HTML specification.

This behaviour in bluemonday, comes from the go/x/html package but it is correct.