sissaschool / xmlschema

XML Schema validator and data conversion library for Python
MIT License
416 stars 72 forks source link

XMLSchema11(...) crashes on XSD, which it has previously validated itself #421

Open PaulKalbitzer opened 2 weeks ago

PaulKalbitzer commented 2 weeks ago

I experience an Error in XMLSchema11, this problem seem not to occure in XMLSchema10

Validating the schema works without any issue, but parsing it results in the xmlschema.validators.exceptions.XMLSchemaParseError shown below

Code used

from xmlschema import  XMLSchema11

XMLSchema11.meta_schema.validate('./NewsML-G2_2.24-spec-All-Core.xsd')

XMLSchema11('./NewsML-G2_2.24-spec-All-Core.xsd')

Schema used

https://schemas.liquid-technologies.com/NewsML/Core/2.24/

Errortrace

  File "/Users/Tom/project/validate_schemas.py", line 7, in <module>
XMLSchema11('/Users/Tom/project/schemas/newsml_core/2.24/NewsML-G2_2.24-spec-All-Core.xsd')
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/schemas.py", line 522, in __init__
self.maps.build()
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/global_maps.py", line 668, in build
self.lookup_element(qname)
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/global_maps.py", line 317, in lookup_element
return cast(XsdElement, self._build_global(obj, qname, self.elements))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/global_maps.py", line 337, in _build_global
global_map[qname] = factory_or_class(elem, schema)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/elements.py", line 128, in __init__
super().__init__(elem, schema, parent)
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/xsdbase.py", line 330, in __init__
self.elem = elem
^^^^^^^^^
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/elements.py", line 144, in __setattr__
super().__setattr__(name, value)
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/xsdbase.py", line 342, in __setattr__
self._parse()
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/elements.py", line 1275, in _parse
self._parse_type()
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/elements.py", line 273, in _parse_type
self.type = self.schema.xsd_complex_type_class(child, self.schema, self)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/complex_types.py", line 91, in __init__
super().__init__(elem, schema, parent, name)
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/xsdbase.py", line 330, in __init__
self.elem = elem
^^^^^^^^^
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/xsdbase.py", line 342, in __setattr__
self._parse()
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/complex_types.py", line 837, in _parse
super()._parse()
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/complex_types.py", line 204, in _parse
self._parse_complex_content_extension(derivation_elem, self.base_type)
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/complex_types.py", line 890, in _parse_complex_content_extension
self.parse_error(msg % base_type, elem)
  File "/Users/Tom/project/venv/lib/python3.12/site-packages/xmlschema/validators/xsdbase.py", line 204, in parse_error
raise error
  xmlschema.validators.exceptions.XMLSchemaParseError: base Xsd11ComplexType(name='IntlStringType') is simple or has a simple content:

  Schema component:

  <xs:extension xmlns:xs="http://www.w3.org/2001/XMLSchema" base="IntlStringType">
        <xs:attribute name="role" type="QCodeType" use="optional">
            <xs:annotation>
                <xs:documentation>A refinement of the semantics of the keyword - expressed by a QCode</xs:documentation>
            </xs:annotation>
        </xs:attribute>
        <xs:attribute name="roleuri" type="IRIType" use="optional">
            <xs:annotation>
                <xs:documentation>A refinement of the semantics of the keyword - expressed by a URI</xs:documentation>
            </xs:annotation>
        </xs:attribute>

        <xs:attribute name="confidence" type="Int100Type" use="optional">
            <xs:annotation>
                <xs:documentation>The confidence with which the metadata has been assigned.</xs:documentation>
            </xs:annotation>
        </xs:attribute>
        <xs:attribute name="relevance" type="Int100Type" use="optional">
            <xs:annotation>
                <xs:documentation>The relevance of the metadata to the news content to which it was attached.</xs:documentation>
        ...
        ...
    </xs:extension>

  Path: /xs:schema/xs:element[48]/xs:complexType/xs:complexContent/xs:extension
brunato commented 1 week ago

Hi, the schema validation is the validation of XSD schema file using the meta-schema (a schema built once using the XSD files in the directory xmlschema/schemas/XSD_1.1), and only checking the syntax of XSD definitions/declarations, without parsing the validity of the values (e.g. if a baseType value points to a valid type and if it's compatible with the parsed type), so it's normal that a schema is checked as valid by the meta-schema but an invalidity is found when you try to build a schema instance from it.

The schema that you use is valid with XSD 1.0, but is invalid with XSD 1.1. Taking the comments above the code that raises the exception:

    def _parse_complex_content_extension(self, elem: ElementType, base_type: Any) -> None:
        # Complex content extension with simple base is forbidden XSD 1.1.
        # For the detailed rule refer to XSD 1.1 documentation:
        #   https://www.w3.org/TR/2012/REC-xmlschema11-1-20120405/#sec-cos-ct-extends
        if base_type.is_simple() or base_type.has_simple_content():
            msg = _("base %r is simple or has a simple content")
            self.parse_error(msg % base_type, elem)
            base_type = self.any_type

There is also a case of XSD W3C tests that is valid with XSD 1.0 and invalid with XSD 1.1.

Comparing the matching rule 1.4 for XSD 1.0 and XSD 1.1 the latest is more strict.

Best regards