w3c / mathml

MathML4 editors draft
https://w3c.github.io/mathml/
Other
61 stars 18 forks source link

XSD not allowing <semantics>-element as child of <apply>-element #498

Open mgrub opened 3 months ago

mgrub commented 3 months ago

Dear maintainers,

thank you very much for providing MathML and taking care of it!

I want to semantically annotate operators in a formula, which seems to be a perfect use-case for the <semantics>-element. A minimal working example (MWE) for my application is very similiar to the last example of subsection 4.2.1.3:

mathml_mwe.xml:

<math xmlns="http://www.w3.org/1998/Math/MathML">
    <apply>
        <semantics>
            <csymbol id="operation">OP</csymbol>
            <annotation>OPERATOR</annotation>
        </semantics>
        <semantics>
            <ci id="input">x</ci>
            <annotation>INPUT</annotation>
        </semantics>
    </apply>
</math>

However, if I validate either the MWE or the mentioned example in the documentation against the XSD, both result in (something like) this:

failed validating <Element '{http://www.w3.org/1998/Math/MathML}apply' at 
0x00000219EFDF6ED0> with XsdGroup(model='sequence', occurs=[1, 1]):

Reason: Unexpected child with tag 'm:semantics' at position 1.

Similiar errors arise, if only the <csymbol>-element or the <ci>-element is semantically annotated. It seems, like the schema does not allow a <semantics>-child inside an <apply>-element. My questions are therefore: Is this behaviour intended? Am I using the <semantics>-element in a wrong way? Might this be a problem with my validation script? Or does the schema need an adjustment?

Best regards Maximilian


I am validating using Python and xmlschema as follows:

mathml_validate.py:

import os
import xmlschema

mathml_object = os.path.abspath("mathml_mwe.xml")
mathml_schema_path = "https://www.w3.org/Math/XMLSchema/mathml3/mathml3.xsd"

try:
    schema = xmlschema.XMLSchema(mathml_schema_path)
    result = schema.validate(mathml_object)
    print(result)
except Exception as e:
    print(f"Schema did not validate successfully. (Error: {e})")
dginev commented 3 months ago

This may be a schema implementation detail?

I took a look at the .rnc files maintained by the W3C validator project, where semantics is included separately for presentation use, and for content use, through mathml3-common.rnc and mathml3-strict-content.rnc respectively.

The relevant snippet for this issue, is the use of the semantics-contexp rule in ContExp, which makes it available in apply as long as one is validating against the "strict" rnc. (link)

But interestingly, the other Content-defining schema, mathml3-content.rnc has no mention of semantics itself. Just a local extension for semantics-ci.

I suspect @davidcarlisle has the answers here.

davidcarlisle commented 3 months ago

The same issue appears to be the case with the draft mathml4 schema at

https://github.com/w3c/mathml-schema

The supplied example is valid to the normative RelaxNG schema but fails validation to the XSD which is clearly a bug and the transformation from RelaxNG to W3C XSD schemas has failed. I'm travelling today but will check later exactly where this has failed, thanks for the report.

mgrub commented 3 months ago

Dear @dginev and @davidcarlisle ,

thank you very much for the immediate response, very detailed feedback and confirmation of unwanted behaviour! And also thank you for the reminder that only the relaxNG-schema is normative - I will change my validation to be based on that.

Best regards Maximilian