microcosm-cc / bluemonday

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS
https://github.com/microcosm-cc/bluemonday
BSD 3-Clause "New" or "Revised" License
3.14k stars 176 forks source link

Was Sanitized Flag #59

Closed levidurfee closed 6 years ago

levidurfee commented 6 years ago

Is there a way to check if the input was sanitized? Maybe something like the code below?

// . . .
sanitized := p.Sanitize(unSanitized)
if p.WasSanitized {
    fmt.Println("Needed to be Sanitized")
}
// . . .

If there isn't a way to do this, could someone add it? Or would you recommend comparing the unsanitized input to the sanitized input?

Thanks! :)

buro9 commented 6 years ago

A single policy p can be used to sanitize many pieces of user supplied content, so having a boolean on the policy does not make sense as it would always return true after the first call of p.Sanitize(html).

It would be better to track this state in your application, alongside the content that you are sanitizing. This is the recommendation.

As a guide, the way I approach this is that I store both the raw (unsanitized) input in my database and the sanitized output. I have a NOT NULL column for the raw input, and a NULLable column for html. I INSERT into raw, and only when I need to return the sanitized content to a client (API or web) I check the html column and if it's still NULL I read raw, sanitize it, and insert it into html and then return the sanitized string to the client.

I like this approach, as if a vulnerability is discovered with bluemonday, I can simple update my table to NULL my html column and it will re-sanitize everything again, and it uses the html column as a cache to prevent unnecessary CPU cost in calling p.Sanitize() for something I've sanitized in the past.

I'm doing the work in my application to know what has been sanitized or not... lazily.