mvantellingen / python-zeep

A Python SOAP client
http://docs.python-zeep.org
Other
1.88k stars 586 forks source link

Zeep raises AttributeError instead of TransportError when parsing invalid xml with strict=False #1337

Open bluemoo opened 2 years ago

bluemoo commented 2 years ago

Zeep version 4.1.0 installed via pip install zeep. I expect that if the response is entirely unparsable (perhaps because the server had an error and didn't even return XML), then the module should raise a TransportError with the message about invalid XML, even when in non-strict mode. What actually happens in non-strict mode is that we get an AttributeError raised by the internals of defusedxml.xml.fromstring.

Here is a script that you can use to see the two behaviors in action. First run this:

import pretend  # pip install pretend

from zeep import Client
from zeep.transports import Transport
from zeep import Settings

def run(strict):
    client = Client('http://www.dneonline.com/calculator.asmx?wsdl', settings=Settings(strict=strict))
    response = pretend.stub(
        status_code=200,
        headers={},
        content="""
            Everything exploded! I am not XML at all.
        """)
    operation = client.service._binding._operations['Add']
    result = client.service._binding.process_reply(
        client, operation, response)

You can see the expected behavior, when strict = True by calling run(True):

>>> run(True)
Traceback (most recent call last):
  File "/home/noahwork/.local/lib/python3.6/site-packages/zeep/loader.py", line 50, in parse_xml
    elementtree = fromstring(content, parser=parser, base_url=base_url)
  File "src/lxml/etree.pyx", line 3254, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1913, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1793, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1082, in lxml.etree._BaseParser._parseUnicodeDoc
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "<string>", line 2
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 2, column 13

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/noahwork/.local/lib/python3.6/site-packages/zeep/wsdl/bindings/soap.py", line 204, in process_reply
    doc = parse_xml(content, self.transport, settings=client.settings)
  File "/home/noahwork/.local/lib/python3.6/site-packages/zeep/loader.py", line 67, in parse_xml
    "Invalid XML content received (%s)" % exc.msg, content=content
zeep.exceptions.XMLSyntaxError: Invalid XML content received (Start tag expected, '<' not found, line 2, column 13)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 11, in run
  File "/home/noahwork/.local/lib/python3.6/site-packages/zeep/wsdl/bindings/soap.py", line 210, in process_reply
    content=response.content,
zeep.exceptions.TransportError: Server returned response (200) with invalid XML: Invalid XML content received (Start tag expected, '<' not found, line 2, column 13).
Content: '\n            Everything exploded! I am not XML at all.\n

Then the behavior when strict is False:

>>> run(strict=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 11, in run
  File "/home/noahwork/.local/lib/python3.6/site-packages/zeep/wsdl/bindings/soap.py", line 204, in process_reply
    doc = parse_xml(content, self.transport, settings=client.settings)
  File "/home/noahwork/.local/lib/python3.6/site-packages/zeep/loader.py", line 51, in parse_xml
    docinfo = elementtree.getroottree().docinfo
AttributeError: 'NoneType' object has no attribute 'getroottree'

This happens because zeep.loader.parse_xml calls defusedxml.lxml.fromstring, which in turn has the lines

rootelement = _etree.fromstring(text, parser, base_url=base_url)
elementtree = rootelement.getroottree()

When strict is True, the first line raises an etree.XMLSyntaxError, subsequently caught in parse_xml and reraised as a zeep.exceptions.XMLSyntaxError (which is in turn caught outside and turned into a TransportError). When strict is False, the first line returns None, resulting in the generic AttributeError which is NOT caught in parse_xml.

I propose either of the following fixes:

  1. Change the try...except block in parse_xml to also catch AttributeError exceptions. (Simplest, though I don't know if there are other situations which can raise this error).
  2. Replace the call to defusedxml.lxml.fromstring in parse_xml with a local version of the function that does exactly the same thing except that it also checks whether the rootelement is None, and if so, raises a zeep.exceptions.XMLSyntaxError.

If either of the above is acceptable, I can provide a patch.