microcosm-cc / bluemonday

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS
https://github.com/microcosm-cc/bluemonday
BSD 3-Clause "New" or "Revised" License
3.12k stars 176 forks source link

Add function to sanitize to writer directly #110

Closed zeripath closed 3 years ago

zeripath commented 3 years ago

Sometimes emitting the output of the sanitizer out directly to a writer will be preferable to using a buffer for output - especially in cases when the input is large.

Signed-off-by: Andrew Thornton art27@cantab.net

6543 commented 3 years ago

any update?

buro9 commented 3 years ago

Truly only just got around to spending more than 10 seconds on it.

LGTM... will merge.

How close is this to making it a fully streamable sanitizer with low memory usage? Fully reading, and fully buffering before writing was the memory usage originally even though as a tokenizer it always had the potential to not have to do that (but was originally written in a day and that was not my focus nor need at the time as my input was already constrained to 10KB). This looks like it either fully takes us there or pretty far along the path to having it.

zeripath commented 3 years ago

@buro9 yeah I think this probably does make bluemonday fully streaming that was my intention when I wrote this PR.

I do wonder if we need to think about handling aborted writes - e.g. if we need to track and handle missing closed elements if there is a partial write but that probably is the responsibility of a downstream limitwriter to handle - although it's possible that bluemonday could provide a safe limitwriter that even if it were limited it would close elements etc.