id, htmlId regex's are not correct (antisamy.xml)

Looking at antisamy.xml, SVN revision 137:

Attribute "id" uses: <regexp value="[a-zA-Z0-9_\-\:]+"/>
whereas htmlId is defined as:
<regexp name="htmlId" value="[a-zA-Z0-9-_]+"/>

I believe these two regex's should be the same - why not use <regexp
name="htmlId"> in the attribute definition for "id"?

I don't believe either of the two regex's are 100% correct. According to
http://www.w3.org/TR/html401/types.html#h-6.2: 
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed
by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"),
colons (":"), and periods (".").

In a similar vein, one of the attributes that references an ID is attribute
name="headers", which uses:
<regexp value="[a-zA-Z0-9\s*]*"/>
This is slightly different than the other the other "id" regex's because it
can be a space-separted list of IDs; however, note that the permissible
non-alphanumerics (hyphen, underscore, colon, period) are not included in
this regex.

Original issue reported on code.google.com by danr...@gmail.com on 23 Dec 2009 at 8:06

rjatkins / owaspantisamy

id, htmlId regex's are not correct (antisamy.xml) #62