Closed dmitshur closed 10 years ago
The TrimSpace was there to avoid work where it wasn't required.
I pondered on it for a bit, and although whitespace is not significant most of the time in HTML, there are occasions in which whitespace is important and bluemonday doesn't presume anything about where you are using the sanitized HTML.
Example:
<html><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<style type="text/css">
pre {border: 1px solid red;}
</style>
</head>
<body>
<pre>
<div>tall</div>
</pre>
<pre><div>short</div></pre>
</body>
</html>
The two pre areas are rendered differently, and bluemonday shouldn't have forced them to be the same when in this context they are different.
The TrimSpaces() instruction was added to avoid work where it can be avoided (if there is no HTML or character data to process and it's extremely cheap to check, then don't do sanitization). It wasn't there to change the whitespace of the input, so the bug was that the work avoidance modified the output.
I've left the work avoidance aspect in the code, but now preserve the whitespace.
and although whitespace is not significant most of the time in HTML, there are occasions in which whitespace is important
Do you mean "although leading and trailing whitespace is not significant most of the time in HTML, there are occasions in which leading and trailing whitespace is important"? Because TrimSpace only trims leading and trailing whitespace, not inside.
But if I understand correctly, you're showing an example where the inner content of
tag only was processed by bluemonday.Anyway, the change LGTM as I prefer it not touch the leading and trailing whitespace anyway (which became the behavior since I've switched to
SanitizeBytes
). It's something the user could always do explicitly (before calling bluemonday), if they want to.
The TrimSpaces() instruction was added to avoid work where it can be avoided (if there is no HTML or character data to process and it's extremely cheap to check, then don't do sanitization).
I've left the work avoidance aspect in the code
Any reason you've opted to not use bytes.TrimSpace in an equivalent manner in SanitizeBytes?
I'll add a quick change to the work avoidance in the bytes func to make it consistent with the strings method.
Now on holiday for a couple of weeks, I'll only merge and make changes that are security critical in that time.
There are now 3 funcs to perform sanitization.
Their signatures and documentation suggest they differ only in the types for input/output.
However,
func (p *Policy) Sanitize(s string) string
differs from the other two in behavior:This is because Sanitize performs
s = strings.TrimSpace(s)
, while the other two don't do the equivalent action.Is this difference intended? I am guessing it is (and I like the current behavior, it seems appropriate). But then the documentation should reflect it.