relaxng / jing-trang

Schema validation and conversion based on RELAX NG
http://www.thaiopensource.com/relaxng/
Other
228 stars 69 forks source link

conflicting ID-types for attribute "id" #211

Open sthibaul opened 8 years ago

sthibaul commented 8 years ago

Hello,

As reported by Vincent Lefevre in Debian bug report http://bugs.debian.org/834555 :

“ jing yields an error on a valid XML file (neither xmllint, nor Emacs nXML complain).

Consider the following files:

==> tdb.xml <== <?xml version="1.0" encoding="utf-8"?>

.

==> tdb.rnc <== default namespace = "http://localhost/"

include "/usr/share/xml/docbook/schema/rng/5.0/docbook.rnc" { start |= notAllowed }

root = element root { attribute xml:id { xsd:ID }, db.para }

start = root

==> tdb.rng <== <?xml version="1.0" encoding="UTF-8"?> <grammar ns="http://localhost/" xmlns="http://relaxng.org/ns/structure/1.0" +datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">

Note: I generated tdb.rng with "trang tdb.rnc tdb.rng" and updated the path to docbook.rng to reuse the schemas from the docbook5-xml package.

I get the following error:

zira:~> jing tdb.rng tdb.xml [warning] /usr/bin/jing: No java runtime was found /usr/share/xml/docbook/schema/rng/5.0/docbook.rng:83:16: error: conflicting ID-types for attribute "id" from +namespace "http://www.w3.org/XML/1998/namespace" of element "root" from namespace "http://localhost/"

while with xmllint from libxml2-utils:

zira:~> xmllint --noout --relaxng tdb.rng tdb.xml tdb.xml validates

and when I open tdb.xml in Emacs, it is said:

-UUU:----F1 tdb.xml All L1 (nXML Valid) -------------- Using schema ~/tdb.rnc ”

sthibaul commented 8 years ago

Of course github mangled the xml code... Here are the files attached

sthibaul commented 8 years ago

tdb.xml.txt tdb.rnc.txt tdb.rng.txt

sthibaul commented 8 years ago

(I had to append .txt extensions for github to be happy...)

georgebina commented 8 years ago

ID checking is defined as part of the Relax NG DTD compatibility specification and being something inherited from DTDs, their definition should be consistent for an element matched by a Relax NG pattern. Usually the problem appears when the same content can be matched by a wildcard-like pattern (any element with any content, with any attribute etc.) because the same attribute will be considered with no ID type by the wildcard-like pattern and with ID type by the more concrete pattern that defines the element and the attribute. If you look in docbook.rng at the indicated line you will see that there is a wildcard-like pattern there and that will match xml:id with no ID type while you defined that to be an ID in your schema, thus the error. You can turn off ID checking in Jing - look for the available options - or you can change the schema to avoid this problem. One possibility is to change the any pattern to exclude xml:id and match that explicitly as an ID, something like below

    <define name="db._any.attribute">
      <choice>
        <attribute>
          <a:documentation>Any attribute including in any attribute in any
            namespace.</a:documentation>
          <anyName>
            <except>
              <name>xml:id</name>
            </except>
          </anyName>
        </attribute>
        <attribute name="xml:id">
          <data type="ID"/>
        </attribute>
      </choice>
    </define>

Regards, George

vinc17fr commented 8 years ago

But db._any isn't used anywhere in the XML file. There may be an error in the DocBook 5 schema (for instance, if an XML file uses xml:id somewhere in MathML contents as a descendant of a DocBook element, the user wouldn't get what he may expect), but here xml:id doesn't appear as a descendant of a DocBook element, so that db._any isn't used and there shouldn't be any error.

georgebina commented 8 years ago

The thing is that it is possible in an instance document to appear a "root" element in that area and in that case the processor will not know how to consider the ID type for the xml:id attribute - this is a static error, that analyses the schema, not a runtime error on a specific instance document. I mentioned the options above - one is to turn off ID checking.

Best Regards, George

vinc17fr commented 8 years ago

If a "root" element appears in db._any, then the xml:id attribute would have type text in this context, because this is what the grammar says. Consider the following XML file:

<?xml version="1.0" encoding="utf-8"?>
<root xmlns="http://localhost/" xml:id="foo">
  <para xmlns="http://docbook.org/ns/docbook" linkend="foo">
    <inlineequation>
      <foo xmlns="http://www.w3.org/1998/Math/MathML">
        <root xmlns="http://localhost/" xml:id="bar"/>
      </foo>
    </inlineequation>
  </para>
</root>

Here, the xml:id="foo" would be of type ID because one has start = element root { attribute xml:id { xsd:ID }, db.para }. However the other "root" element is part of db._any, with db._any = element * - (db:* | html:*) { (db._any.attribute | text | db._any)* } and db._any.attribute = attribute * { text }, so that the type of xml:id="bar" would be text.

That said, the "xml:" namespace is special, as it is standard. https://www.w3.org/XML/1998/namespace says: "The xml:id specification defines a single attribute, xml:id, known to be of type ID independently of any DTD or schema." Note the "independently". So, because of this, xml:id="bar" is of type ID. This is how libxml2 behaves (I've checked, replacing linkend="foo" by linkend="bar"). That's probably why the DocBook 5 schema doesn't exclude xml:* in db._any.

And note that ID checking is useful, I don't want to turn it off. Currently, jing cannot work with any serious schema that mixes DocBook 5 and another namespace.

georgebina commented 8 years ago

I agree that this is one of the major pain points with Relax NG, but I do not know the best way forward... Ideally, wildard names like anyName or nsName should not contribute to identifying the ID type and ID type assignment should be done only using the information from elements/attributes specified with specific names. A similar issue appears for DITA 1.3 which uses Relax NG as normative schema and there we need to exclude some element names to get the schemas working. Maybe the best solution will be an update to the DTD compatibility spec http://relaxng.org/compatibility-20011203.html#id to say that anyName and nsName name classes should not be considered when we check if two element/attribute to ID type mappings compete and then we can follow with updating Jing accordingly. Maybe @jclark can share some insight on this.

Regards, George

vinc17fr commented 8 years ago

If this can be useful in tests, here are two standalone examples.

The first example

<?xml version="1.0" encoding="utf-8"?>
<ex1>
  <foo xml:id="a">
    <bar ref="a b">
      <foo xml:id="b"/>
    </bar>
  </foo>
</ex1>

The corresponding schema:

start =
  element ex1 {
    element foo {
      attribute xml:id { xsd:ID }?,
      element bar {
        attribute ref { xsd:IDREFS }?,
        element foo {
          attribute xml:id { text }?
        }
      }
    }
  }
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <start>
    <element name="ex1">
      <element name="foo">
        <optional>
          <attribute name="xml:id">
            <data type="ID"/>
          </attribute>
        </optional>
        <element name="bar">
          <optional>
            <attribute name="ref">
              <data type="IDREFS"/>
            </attribute>
          </optional>
          <element name="foo">
            <optional>
              <attribute name="xml:id"/>
            </optional>
          </element>
        </element>
      </element>
    </element>
  </start>
</grammar>

For this first example, according to the "xml:" namespace specifications, xml:id is always of type ID (what's inside attribute xml:id { } should be ignored). For this reason, I don't think there is a DTD compatibility issue concerning this example (this will be different in the second example, which I assume is less common).

The second example

<?xml version="1.0" encoding="utf-8"?>
<ex2>
  <foo myid="a">
    <bar ref="a">
      <foo myid="b"/>
    </bar>
  </foo>
</ex2>

The corresponding schema:

start =
  element ex2 {
    element foo {
      attribute myid { xsd:ID }?,
      element bar {
        attribute ref { xsd:IDREFS }?,
        element foo {
          attribute myid { text }?
        }
      }
    }
  }
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <start>
    <element name="ex2">
      <element name="foo">
        <optional>
          <attribute name="myid">
            <data type="ID"/>
          </attribute>
        </optional>
        <element name="bar">
          <optional>
            <attribute name="ref">
              <data type="IDREFS"/>
            </attribute>
          </optional>
          <element name="foo">
            <optional>
              <attribute name="myid"/>
            </optional>
          </element>
        </element>
      </element>
    </element>
  </start>
</grammar>

Here, I've replaced the standard xml:id by myid. So, the first myid instance myid="a" is of type ID, but not the second myid instance myid="b" (the validation should fail if ref="a" is replaced by ref="b"). Again, libxml2 behaves that way.

georgebina commented 8 years ago

Please note that this functionality is part of the DTD compatibility specification, and that means you cannot have two different declarations for the same attribute in the same element, because you cannot have that in a DTD.

vinc17fr commented 8 years ago

I suggest two possibilities:

  1. In a schema, for xml:id attributes, ignore the specified type and force it to ID as required by the XML specs. This should solve the issue with DocBook 5 and the first example (but not with the second example).
  2. Add an option to ignore the RELAX NG DTD Compatibility, while still checking ID/IDREF/IDREFS as specified in the validity constraints on the XML attribute types. This would solve issues in simple cases, but not in the first example, as one would get an error because the IDREF "b" does not have a matching ID; in practice, such errors would occur when a DocBook 5 document has an IDREF attribute referring an ID found in MathML or SVG (this is where db._any is involved) contained in the document.
sideshowbarker commented 6 years ago

Is this an outright bug that we ideally should fix in the sources? Or rather if it’s more of an enhancement request?

sthibaul commented 6 years ago

Well, it looks like a bug: the tool is saying the tdb.xml file is invalid while it is valid

georgebina commented 6 years ago

The conflicting ID type error is not reported on the XML document, it is a problem reported on the schema and it is related to the DTD compatibility spec [1]. The DTD compatibility ID checking is controlled by an option [2], so you can disable that. This check does what the DTD compatibility spec says, so it is not a problem in Jing, if the spec is updated then Jing can follow the updated spec.

[1] https://www.oasis-open.org/committees/relax-ng/compatibility-20011203.html#id

if its attribute parent has any competing attribute elements, then each such competing attribute element has a data or value child specifying a datatype associated with the same ID-type. Two attribute elements <attribute> nc1 p1 </attribute> and <attribute> nc2 p2 </attribute> compete if and only if the containing definitions compete and there is a name n that belongs to both nc1 and nc2. Note that a definition competes with itself.

[2] http://www.thaiopensource.com/relaxng/jing.html

-i Disables checking of ID/IDREF/IDREFS. By default, Jing enforces the constraints imposed by RELAX NG DTD Compatibility with respect to ID/IDREF/IDREFS.

vinc17fr commented 6 years ago

The -i option is not OK, since I still want ID checking. Compare with xmllint --relaxng, for instance.