patrickfrey / strusBindings

Language bindings (Java,Python,PHP,etc.) for strus
http://www.project-strus.net
Other
3 stars 0 forks source link

Do python bindings support Unicode? #38

Open andreasbaumann opened 7 years ago

andreasbaumann commented 7 years ago
curl -X POST --data-binary @content_bildungsgeschichte.xml localhost:80/insert
+--header "Content-Type:text/xml; charset=UTF-8"
ERROR:tornado.application:Uncaught exception POST /insert (::1)
HTTPRequest(protocol='http', host='localhost', method='POST', uri='/insert', version='HTTP/1.1',
+remote_ip='::1', headers={'Content-Length': '546784', 'Expect': '100-continue', 'Content-Type': 'text/xml;
+charset=UTF-8', 'Host': 'localhost', 'Accept': '*/*', 'User-Agent': 'curl/7.35.0'})
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 1141, in _when_complete
    callback()
  File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 1162, in _execute_method
    self._when_complete(method(*self.path_args, **self.path_kwargs),
  File "src/step8/strusServer.py", line 19, in post
    nofDocuments = backend.insertDocuments( content)
  File "/home/strus/src/step8/strusIR.py", line 124, in insertDocuments
    docqueue.push( content)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 176: ordinal not in range(128)
<html><title>500: Internal Server Error</title><body>500: Internal Server
+Error</body></html>ERROR:tornado.access:500 POST /insert (::1) 94.12ms

The version is 2.7.6 (the version used in the docker images of the tutorial).

patrickfrey commented 7 years ago

The following test program fails with the same error message:

!/usr/bin/python

coding=UTF-8

print unicode("风雷动")

Traceback (most recent call last): File "test_unicode.py", line 4, in print unicode("风雷动") UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

patrickfrey commented 7 years ago

The following example works:

!/usr/bin/python

coding=UTF-8

print unicode(u'风雷动').encode('utf-8')

patrickfrey commented 7 years ago

My suggestions for using strus with Python 2.7 is to convert python strings explicitely to "UTF-8" as soon as possible.

patrickfrey commented 5 years ago

Starting with revision 0.16 the bindings of strus are based on python3.x, support for python2.7 is dropped. There are still issues of unicode in the Wikipedia demo search. These are caused by the rewriting of the python part of the code from Python2.7 to 3.4. The bindings and strus itself is supporting Unicode.