Closed J20S closed 8 years ago
The check exists because:
# base64 is too lenient: it accepts 'ZZZ=' as an encoding of 'e', while
# the required XML Schema production requires 'ZQ=='. Define a regular
# expression per section 3.2.16.
Proposed workaround is to add API that allows the user to specify a maximum size for base64 literals that will be validated against the XML requirement disallowing. Setting this to zero would disable the extra check; setting it to (say) 64 would keep the check for small values while avoiding it for file uploads.
The default would be None
meaning that the validation would always be performed; applications that use large files would have to intentionally disable the check.
This should be in the next release, whenever that happens.
Hi Peter,
Thanks for providing this feature in the new release!
I expect the usage of this feature is that we manually add something like:
pyxb.binding.datatypes.base64Binary.XsdValidateLength(-1)
in the binding file generated by pyxbgen command to disable the validation.
I really want to get the above process automated. So except for writing extra scripts for it, is there any chance that we can configure it in command line options?
I understand this might be a separate feature request, but if there is an existing workaround, it will be awesome!
Cheers, James
The setting doesn't go into the binding file; it's a configuration change that affects validation globally, so just disable the validation once in the application that uses the bindings. If you have base64Binary values that you still want fully validated you'll need to set and clear it in the application depending on whether the specific document is likely to be affected. There is no way to limit the validation to specific elements or namespaces.
Thanks Peter, I got it now!
Hello, I have experienced performance issues when trying uploading large files. For example, say we have the following schema:
<xsd:complexType name="uploadFileRequest">
<xsd:sequence>
<xsd:element name="file" type="xsd:base64Binary" minOccurs="1" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
When the file is large, say 7MB, I can notice significant performance issue. I have located the problem to CreateFromDocument() function which is generated by pyxb. It is used to "Parse the given XML and use the document element to create a Python instance".
More specifically, it is the following line in the above method which takes majority of the time to execute: saxer.parse(io.BytesIO(xmld)) where xmld is the xml string that is passed into this function.
I posted this issue on Source Forge, thanks @pabigot for pointing out that it is the regex match that is costing the majority of the time.
# This is what it costs to try to be a validating processor.
if cls.__Lexical_re.match(xmlt) is None:
raise SimpleTypeValueError(cls, xmlt)
If we comment this code block out, this issue is fixed. However, since the above code is about "As PyXB is a validating processor it must check whether the incoming encoded data is a valid XML representation. (Peter)", it would be good to have a workaround for this to be part of future releases.
Thanks a lot for your help! @pabigot
Cheers, James