rjatkins / owaspantisamy

Automatically exported from code.google.com/p/owaspantisamy
0 stars 0 forks source link

cssIdentifier regex is incorrect (antisamy.xml) #70

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Looking at antisamy.xml, SVN revision 137:

<regexp name="cssIdentifier" value="[a-zA-Z][a-zA-Z0-9-]+"/>

This implies that CSS identifiers need to be at least two characters in
length; I don't believe this is a requirement in the spec (that is, I
believe single-character identifiers are valid). I think a more correct
regex would be:

<regexp name="cssIdentifier" value="[a-zA-Z][a-zA-Z0-9-]*"/>

The real spec is actually a lot more complicated than this. From
http://www.w3.org/TR/CSS21/syndata.html#value-def-identifier:

In CSS, identifiers (including element names, classes, and IDs in
selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646
characters U+00A1 and higher, plus the hyphen (-) and the underscore (_);
they cannot start with a digit, or a hyphen followed by a digit.
Identifiers can also contain escaped characters and any ISO 10646 character
as a numeric code (see next item). For instance, the identifier "B&W?" may
be written as "B\&W\?" or "B\26 W\3F".

Original issue reported on code.google.com by danr...@gmail.com on 23 Dec 2009 at 9:08

GoogleCodeExporter commented 9 years ago
The spec is indeed complicated. Not regular expression friendly. I removed the
two-character minimum but left everything else to avoid complexity and 
performance
hits. I'm labeling this as "Fixed" because compliance with the spec is not our
primary mission. I'd rather be in-step with industry code and safe and fast as
opposed to any alternative.

Original comment by arshan.d...@gmail.com on 8 Mar 2010 at 5:38