relaxng / jing-trang

Schema validation and conversion based on RELAX NG
http://www.thaiopensource.com/relaxng/
Other
228 stars 69 forks source link

Discussion about allowed patterns for ID data types (and for NCNames in general) #188

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Let's say I have an XML document which has an ID-type attribute with a 
character with hex code 03DB:

 idAttr="ϛ"

which is validated with Jing using the pattern:

 <data type="ID"/>

The validation reports the attribute value as invalid.

Specs for ID type in RNG schemas here:

http://relaxng.org/compatibility-20011203.html#id

defines it to to be the same ID defined in the XML Schema data types:

http://www.w3.org/TR/xmlschema-2/#ID

which binds it directly to the ID attribute type specification from a XML 1.0 
working draft second edition specification.
The XML 1.0 working draft binds the ID attribute type to a Name production:

www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Name

This particular character has the hex code #x03DB which indeed does not seem to 
fit anywhere in the base char production:

..... [#x03D0-#x03D6] | #x03DA | #x03DC | #x03DE | #x03E0.......

The thing is that the final XML 1.0 specs allows this character to appear in 
name start and name chars:

http://www.w3.org/TR/REC-xml/#NT-NameStartChar

So I'm not sure what could be done about this. Could we relax the validation 
from the "com.thaiopensource.xml.util.Naming" class to be compatible with the 
final XML 1.0 specs? Or would this mean not obeying the Relax NG standard as 
the standard points to the XML Schema 1.0 data types standard which points to 
this XML 1.0 working draft?

Original issue reported on code.google.com by raducor...@gmail.com on 22 Oct 2014 at 7:24

georgebina commented 8 years ago

The XML 1.0 4th edition (aug 2006) still rejects ϛ (#x03DB) as a start character for names: http://www.w3.org/TR/2006/REC-xml-20060816/#NT-BaseChar [...]| #x03DA | #x03DC | [...] but the 5th edition (Nov 2008) allows it: http://www.w3.org/TR/xml/#NT-NameStartChar [...]| [#x37F-#x1FFF] |[...] FWIW, Xerces also rejects this character in an XML 1.0 name - it accepts it if the XML version is set to 1.1.

georgebina commented 8 years ago

@jclark Do you think we should update the mod/util/src/main/com/thaiopensource/xml/util/Naming.java to reflect the 5th edition?