Problem with receiving unicode string with umlauts

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Send a string containing non-ascii characters to a soap function

What is the expected output? What do you see instead?

ERROR:pysimplesoap.server:Traceback (most recent call last):
  File "lib/pysimplesoap/server.py", line 167, in dispatch
    args = method.children().unmarshall(args_types)
  File "lib/pysimplesoap/simplexml.py", line 527, in unmarshall
    elif str(node) or (fn == str and str(node) != ''):
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 
11: ordinal not in range(128)

What version of the product are you using? On what operating system?

Current Development version

Please provide any additional information below.

I made some changes for it to work, see the attached patch.

Original issue reported on code.google.com by heineman...@gmail.com on 8 Aug 2013 at 12:23

Attachments:

unicode.patch

GoogleCodeExporter commented 9 years ago

thanks!
could you provide a test case?

Original comment by reingart@gmail.com on 15 Aug 2013 at 6:44

Changed state: Started

GoogleCodeExporter commented 9 years ago

Sure, check the attachment. I had to do it with sockets because the client 
library has problems with the umlauts too.

Original comment by heineman...@gmail.com on 16 Aug 2013 at 10:53

Attachments:

issue_110.py

GoogleCodeExporter commented 9 years ago

Your fix partially solves the issue but a more deeper unicode handling has to 
be done.
This seems to be a regression failure, as it was working propertly before some 
contributor made a troubling changes.

You can see my experimental branch (wsdl_namespaces) where this has been 
reverted:

https://code.google.com/p/pysimplesoap/source/detail?r=c4f989d31f1bfc1602ef0c594
c3533a4ec617ecf&name=wsdl_namespaces

BTW, why you have to use plain sockets? 
The client should fully support unicode requests/responses too, maybe there is 
another regression.

Original comment by reingart@gmail.com on 24 Aug 2013 at 7:42

GoogleCodeExporter commented 9 years ago

I wasn't quite correct about that, it doesn't seem to be the client but 
underlying libraries that actually struggle with the unicode data. I'm not much 
of an expert about unicode in python, so I just tried some things and could 
produce different errors with it but I really don't know what to do about it.

If the socket code is replaced by following:

{{{
client = SoapClient(
    location = "http://localhost:8008/",
    action = 'http://localhost:8008/', # SOAPAction
    namespace = "http://example.com/sample.wsdl", 
    soap_ns='soap',
    ns = False
)

client.set_comment(comment="Die Füchse hätten gerne schöne Hühner 
gefangen.")
}}}

following error is produced:

 Traceback (most recent call last):
   File "./issue_110.py", line 49, in <module>
     client.set_comment(comment="Die Füchse hätten gerne schöne Hühner gefangen.")
   File "/home/sh/adfinis/pysimplesoap_feedback/pysimplesoap/client.py", line 139, in <lambda>
     return lambda self=self, *args, **kwargs: self.call(attr, *args, **kwargs)
   File "/home/sh/adfinis/pysimplesoap_feedback/pysimplesoap/client.py", line 206, in call
     self.xml_request = request.as_xml()
   File "/home/sh/adfinis/pysimplesoap_feedback/pysimplesoap/simplexml.py", line 252, in as_xml
     return self.__document.toxml('UTF-8')
   File "/usr/lib/python2.7/xml/dom/minidom.py", line 46, in toxml
     return self.toprettyxml("", "", encoding)
   File "/usr/lib/python2.7/xml/dom/minidom.py", line 58, in toprettyxml
     self.writexml(writer, "", indent, newl, encoding)
   File "/usr/lib/python2.7/xml/dom/minidom.py", line 1752, in writexml
     node.writexml(writer, indent, addindent, newl)
   File "/usr/lib/python2.7/xml/dom/minidom.py", line 817, in writexml
     node.writexml(writer, indent+addindent, addindent, newl)
   File "/usr/lib/python2.7/xml/dom/minidom.py", line 817, in writexml
     node.writexml(writer, indent+addindent, addindent, newl)
   File "/usr/lib/python2.7/xml/dom/minidom.py", line 817, in writexml
     node.writexml(writer, indent+addindent, addindent, newl)
   File "/usr/lib/python2.7/xml/dom/minidom.py", line 813, in writexml
     self.childNodes[0].writexml(writer, '', '', '')
   File "/usr/lib/python2.7/xml/dom/minidom.py", line 1041, in writexml
     _write_data(writer, "%s%s%s" % (indent, self.data, newl))
   File "/usr/lib/python2.7/xml/dom/minidom.py", line 298, in _write_data
     writer.write(data)
   File "/usr/lib/python2.7/codecs.py", line 351, in write
     data, consumed = self.encode(object, self.errors)
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)

If the string is prefixed with an 'u', an exception will rise from the 
httplib... 

BTW: the simple client example in the wiki has the argument "trace = True", but 
SoapClient does not have such an argument...

Original comment by heineman...@gmail.com on 27 Aug 2013 at 8:03

GoogleCodeExporter commented 9 years ago

This bug doesn't occur on the current development version (1.11)

Original comment by rafaelre...@gmail.com on 3 Oct 2013 at 1:59

GoogleCodeExporter commented 9 years ago

I'm using latest dev version and I have the same problem. After applying 
following change, everything works as expected:

diff --git a/pysimplesoap/simplexml.py b/pysimplesoap/simplexml.py
index 9a23497..7ebb851 100644
--- a/pysimplesoap/simplexml.py
+++ b/pysimplesoap/simplexml.py
@@ -113,9 +113,10 @@ class SimpleXMLElement(object):
     def as_xml(self, filename=None, pretty=False):
     """Return the XML representation of the document"""
     if not pretty:
-        return self.__document.toxml('UTF-8')
+        xml = self.__document.toxml()
     else:
-        return self.__document.toprettyxml(encoding='UTF-8')
+        xml = self.__document.toprettyxml()
+    return xml.encode('ascii', errors='xmlcharrefreplace')

     def __repr__(self):
     """Return the XML representation of this tag"""

Original comment by tomasz.w...@gmail.com on 10 Oct 2013 at 1:55

GoogleCodeExporter commented 9 years ago

These changes could cause problems to those who doesn't use english language.

Original comment by rafaelre...@gmail.com on 10 Oct 2013 at 11:48

GoogleCodeExporter commented 9 years ago

I use Polish and had a lot of problems with UnicodeDecodeError. After 
modification I posted above everything works without throwing eceptions. It 
encodes string to ASCII and replaces all Unicode characters to XML references, 
so for example letter "Ł" commonly used in Polish language is converted from 
u'\u0141' to 'Ł'.

Original comment by tomasz.w...@gmail.com on 10 Oct 2013 at 12:09

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

The changes in #6 solved my unicode issues for chinese language support

Original comment by darkc...@gmail.com on 11 Jun 2014 at 5:09

GoogleCodeExporter commented 9 years ago

I'm having this problem using portuguese chars, like "é", "á", "à"... Could 
anyone explain me how do I apply this patch?

Original comment by jdicarre...@gmail.com on 26 Nov 2014 at 3:40

mettienne / pysimplesoap

Problem with receiving unicode string with umlauts #110