Closed GoogleCodeExporter closed 9 years ago
This is the expected output, but obviously, if the font tag is shuttled into
some non-row content abyss, then it could be improved.
The design decision was made that way because CSS is such a large attack
surface that anything we can do to limit attackers' ability to manipulate CSS
leaves us in a more secure position.
Are there tags besides HTML where this is a problem?
If the styling handler detected that it was inside a table, thead, tbody,
tfoot, rowgroup, colgroup, or tr element, and instead inserted the <font>
element inside contained <td> and <th> elements would that help?
So instead of
<table>
<font face="Arial, Geneva, sans-serif" style="color:#000">
<tbody>
<tr>
<th>Column One</th>
<th>Column Two</th>
</tr>
<tr>
<td align="center"><font style="background-color:#fffffe"><font size="2">Size
2</font></font></td>
<td align="center"><font style="background-color:#fffffe"><font size="7">Size
7</font></font></td>
</tr>
</tbody>
</font>
</table>
you would get
<table style="color:#000">
<tbody>
<tr>
<th><font face="Arial, Geneva, sans-serif">Column One</th>
<th><font face="Arial, Geneva, sans-serif">Column Two</th>
</tr>
<tr>
<td align="center"><font style="background-color:#fffffe" face="Arial, Geneva,
sans-serif" size="2">Size 2</font></td>
<td align="center"><font style="background-color:#fffffe" face="Arial, Geneva,
sans-serif" size="7">Size 7</font></td>
</tr>
</tbody>
</table>
Original comment by mikesamuel@gmail.com
on 2 Feb 2013 at 6:25
White-listing a set of fonts from
http://webdesign.about.com/od/fonts/qt/web-safe-fonts.htm could keep the size
of the sanitized output much closer to the size of the input for large tables,
but I'm loathe to do anything that makes some fonts work with tables and others
not.
Original comment by mikesamuel@gmail.com
on 2 Feb 2013 at 11:23
I'm actually not that much of an HTML expert; this is just the problem I saw in
my first test. I assume there will be other constructs that are similarly
problematic, but I'm not in a position to enumerate them.
I still think you could preserve some style attributes without increasing the
attack surface. Currently when you see
style="color: rgb(0, 0, 0); font-family: Arial, Geneva, sans-serif;"
you parse that into some data structures that are subsequently used in a font
tag. I'm suggesting that instead of emitting the font tag, you could remove the
user-supplied style attributes, and replace it with a new style attribute that
you generate from known constructs. So in my example, you would get:
<table style="color:#000; font-family: Arial, Geneva, sans-serif;">
I'm concerned that any other approach is likely to impact rendering in ways
that aren't obvious to end users. It might not be a big deal if someone is
typing markup into a wiki, but if your app allows someone to paste in a chunk
of HTML (copied from a web page, or perhaps even a word processing
application), then users will expect the rendering to look as similar to the
original as possible.
Original comment by danr...@gmail.com
on 4 Feb 2013 at 8:08
> if your app allows someone to paste in a chunk of HTML (copied from a web
page, or perhaps even a word processing application), then users will expect
the rendering to look as similar to the original as possible.
agreed
> instead of emitting the font tag, you could remove the user-supplied style
attributes, and replace it with a new style attribute that you generate from
known constructs
I do something like that already, just not for font names.
http://code.google.com/p/owasp-java-html-sanitizer/source/browse/trunk/src/tests
/org/owasp/html/StylingPolicyTest.java
http://code.google.com/p/owasp-java-html-sanitizer/source/browse/trunk/src/main/
org/owasp/html/StylingPolicy.java#320
Only font family, align, and style are put on the <font> tag. The latter two
are easy to whitelist.
http://www.w3.org/TR/CSS21/fonts.html#value-def-family-name says
> Font family names must either be given quoted as strings, or unquoted as a
sequence of one or more identifiers. This means most punctuation characters and
digits at the start of each token must be escaped in unquoted font family names.
I'll see if I can come up with a white-list of generic font names (e.g.
sans-serif), and then any non-generic font name that contains only ASCII alpha
numerics and spaces gets quoted and put in a CSS style tag. Anything with
punctuation like the examples below from the CSS spec I'll either reject or
maybe shove in a <font face>.
font-family: Ahem!, sans-serif;
font-family: test@foo, sans-serif;
font-family: #POUND, sans-serif;
font-family: Hawaii 5-0, sans-serif;
I'll test whether vendor prefixed ones like -webkit-small-control survive
quoting. Allowing untrusted code to spoof OS controls might enable trusted
path violation anyway.
Original comment by mikesamuel@gmail.com
on 5 Feb 2013 at 10:37
http://code.google.com/p/owasp-java-html-sanitizer/source/detail?r=147 fixes
this issue. Let me know if that works for you and I'll cut a push to maven
central.
Original comment by mikesamuel@gmail.com
on 12 Feb 2013 at 7:14
Thanks Mike, this does in fact fix the specific problem I reported.
(I'm still concerned that the fidelity of the transformed HTML will be
insufficient unless a lot more CSS constructs are supported, though.)
Original comment by danr...@gmail.com
on 2 Apr 2013 at 10:26
Great. I'll make sure this is on maven current.
Re CSS, my current (full-time) project involves generating sanitizers (and
other tools) from grammars annotated with schema constraints so hopefully
https://code.google.com/p/noinject/source/browse/mlsrc/test-files/san/css/gramma
r.g will soon serve as the basis for a more flexible way to sanitize CSS. That
grammar is very drafty and written against an obsolete version of the CSS3
spec, but the general shape will probably remain the same.
Original comment by mikesamuel@gmail.com
on 5 Apr 2013 at 4:26
r198 includes a significant rewrite of the CSS sanitizer which recognizes a
larger set of CSS properties and no longer introduces <font> elements so should
work well with tables.
Original comment by mikesamuel@gmail.com
on 24 Jul 2013 at 3:55
Original issue reported on code.google.com by
danr...@gmail.com
on 1 Feb 2013 at 4:47