sissaschool / xmlschema

XML Schema validator and data conversion library for Python
MIT License
408 stars 74 forks source link

Incorrect hexBinary validation #63

Closed vient closed 6 years ago

vient commented 6 years ago

Consider checking this example:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document 
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
    <w:body>
        <w:p w:rsidR="00D7310A">
        </w:p>
    </w:body>
</w:document>

using wml.xsd scheme provided in ECMA-376 2nd edition Part 4. You will get following error:

xmlschema.validators.exceptions.XMLSchemaValidationError: failed validating '00D7310A' with XsdSingleFacet('length', value=4, fixed=False).

Schema:

    <xsd:length xmlns:xsd="http://www.w3.org/2001/XMLSchema" value="4" />

Instance:

  <w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" w:rsidR="00D7310A">
  </w:p>

After deleting 2 out of 4 octets document passes checking. In docs it is said that length of hexBinary element is counted in octets, not characters: https://www.w3.org/TR/xmlschema-2/#rf-length, so the xmlschema is wrong here.

brunato commented 6 years ago

The fix is available with release 0.9.29 (also for base64binary types). Those types now use other validation functions (look at hex_length_validator(), hex_min_length_validator() and hex_max_length_validator() methods in facets.py).

Thanks

vient commented 6 years ago

I don't think you can use same validator for hex and base64. If I understand documentation right, length restrictions are given for decoded data, in case of hex it is indeed 2 * encoded_length but for base64 it is encoded_length * 4 / 3 (maybe it's possible to miss some = placeholders in the end so final formula will be a little more complicated).

Also, from base64Binary you got XSD_BASE46_BINARY (46 instead of 64).

brunato commented 6 years ago

Yes, you are right. I made confusion between padded and unpadded data. So the base64 length validator maybe written as:

def base64_length_validator(self, x):
    x = x.replace(' ', '')
    if  (len(x) // 4 * 3 - (x[-1] == '=') - (x[-2] == '=')) != self.value:
        yield XMLSchemaValidationError(self, x)

The same formula can be used for a min and max length validators.

brunato commented 6 years ago

Should be fixed with release 0.9.30.