Open davidshrive opened 3 years ago
Hey @davidshrive, this is an unfortunate consequence of the underlying library being intended primarily for sanitizing text to insert within HTML. I think given this method has use cases beyond sanitation then it's unideal to perform entity escapes as you can't be certain that this.foo.strip_html().unescape_html()
is equivalent to the original text, but in the case where you want sanitized output you'd be able to do this.foo.strip_html().escape_html()
.
The change we'd need to the lib used is very simple, add a policy parameter that's referenced here: https://github.com/microcosm-cc/bluemonday/blob/master/sanitize.go#L363 in order to disable the escapes. I'm not sure if the maintainers would be interested in having that addition merged upstream but we can ask. The other more serious problem though is that existing users might be relying on the current behaviour of strip_html()
, so we'd either need to add another optional parameter, add a new method with a different name (yuck), or wait for Benthos v4.
Thanks @Jeffail
text.strip_html().unescape_html()
does indeed solve the problem.
Optional param seems like a sensible solution to me but low priority now I know the above sorts it 👍
Hey Ash,
The
strip_html()
bloblang function seems to be encoding html entities which was unexpected. For example running the following json document:Through this bloblang...
Returns...
Where you can see the quotes (
"
) have been replaced with their encoded values ("
)I've managed to replicate this issue in the latest benthos (3.40.0).
Do you agree this is a bug or is this the intended behaviour?
Thanks