sissaschool / xmlschema

XML Schema validator and data conversion library for Python
MIT License
399 stars 73 forks source link

XMLSchemaException does not catch all exceptions #404

Open nuntius35 opened 1 week ago

nuntius35 commented 1 week ago

There are errors in an xsd file that raise an xml.etree.ElementTree.ParseError and do not cause an XMLSchemaException. The documentation of XMLSchemaException claims that it catches all exceptions.

Example

File: invalid_schema.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" vc:minVersion="1.1">
</xsd:schem>

File: validate_schema.py

"""Loads an xsd file"""
import xmlschema

def main():
    """Exception raised by invalid schema is expected to be caught"""
    try:
        xmlschema.XMLSchema11("invalid_schema.xsd")
    except xmlschema.XMLSchemaException as e:
        print(e)

if __name__ == '__main__':
    main()

Run the python script with python validate_schema.py.

Expected behaviour: XMLSchema11 raises e.g. XMLSchemaValidatorError and the exception is caught.

Actual behaviour: the exception xml.etree.ElementTree.ParseError is raised and not caught.

brunato commented 1 week ago

Hi, the documentation says that the base exception "let you catch all the errors generated by the library". This case is on the boundary because this a ParseError of the ElementTree library.

Anyway it's better to reason on that because other errors are caught and re-raised.

In this case the error is in the syntax of the XML source, so it should raise an XMLResourceError or a derived exception from it. For v4.0 the XMLResource will be extended to support also parsing with lxml and custom url openers. All the XML data access in the library is delegated to this class so having its error type hierarchy could help to distinguish between XML data access/parsing and XML validation.

An hypothesis for this could be:

from xml.etree.ElementTree import ParseError
from xmlschema import XMLSchemaException

class XMLResourceError(XMLSchemaException):
    """A generic error on an XML resource that let you catch all the errors generated by an XML resource."""

class XMLResourceOSError(XMLResourceError, OSError):
    """Raised when an error is found accessing an XML resource."""

class XMLResourceParseError(XMLResourceError, ParseError):
    """Raised when an error is found parsing an XML resource."""

class XMLResourceBlocked(XMLResourceError):
    """Raised when an XML resource access is blocked by security settings."""

class XMLResourceForbidden(XMLResourceError):
    """Raised when the parsing of an XML resource is forbidden for safety reasons."""

Deriving XMLResourceParseError from ParseError instead of SyntaxError preserves backward compatibility.