protocol buffer's SerializeToString() makes UnicodeDecodeError in python 2.7.11

museghost commented 8 years ago

  File "/Users/servermanager/venv/XXXXXX/lib/python2.7/site-packages/XXX/XXX/nanoserver.py", line 68, in send
    return self.sock.send(self._send_msg.SerializeToString())
  File "/Users/servermanager/venv/XXXXXX/lib/python2.7/site-packages/nnpy/socket.py", line 53, in send
    data = data.encode() if isinstance(data, str) else data
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 3: ordinal not in range(128)

As mentioned above, when sending the string data generated by google protocol buffer, nnpy raises the UnicodeDecodeError in python 2.7

I assume that the protocol buffer serializes data and its type is 'str', however, 'str' type is 'bytes' in python 2.7. When the code meets the above line, data.encode() is called because of the data type 'str'.

However, the protocol buffer's string has many characters as a byte string over 127 so it would be NOT ascii case by case.

To use nnpy with protocol buffer, how can I handle this ? If possible, could you please make a patch ?

djc commented 8 years ago

Hmm, you're right that this is a bit suboptimal. Does the following patch do the job for you:

diff --git a/nnpy/socket.py b/nnpy/socket.py
index 8976041..71404ad 100644
--- a/nnpy/socket.py
+++ b/nnpy/socket.py
@@ -1,7 +1,10 @@
 from . import nanomsg, ffi
+import sys

 NN_MSG = int(ffi.cast("size_t", -1))

+ustr = str if sys.version_info[0] > 2 else unicode
+
 class Socket(object):

     def __init__(self, domain, protocol):
@@ -23,7 +26,7 @@ class Socket(object):
             buf = ffi.new('int*')
             buf[0] = value
             vlen = 4
-        elif isinstance(value, str):
+        elif isinstance(value, ustr):
             buf = ffi.new('char[%s]' % len(value), value.encode())
             vlen = len(value)
         elif isinstance(value, bytes):
@@ -35,13 +38,13 @@ class Socket(object):
         assert rc >= 0, rc

     def bind(self, addr):
-        addr = addr.encode() if isinstance(addr, str) else addr
+        addr = addr.encode() if isinstance(addr, ustr) else addr
         buf = ffi.new('char[]', addr)
         rc = nanomsg.nn_bind(self.sock, buf)
         assert rc > 0, rc

     def connect(self, addr):
-        addr = addr.encode() if isinstance(addr, str) else addr
+        addr = addr.encode() if isinstance(addr, ustr) else addr
         buf = ffi.new('char[]', addr)
         rc = nanomsg.nn_connect(self.sock, buf)
         assert rc > 0, rc
@@ -50,7 +53,7 @@ class Socket(object):
         if isinstance(data, memoryview):
             buf = ffi.from_buffer(data)
         else:
-            data = data.encode() if isinstance(data, str) else data
+            data = data.encode() if isinstance(data, ustr) else data
             buf = ffi.new('char[%i]' % len(data), data)
         rc = nanomsg.nn_send(self.sock, buf, len(data), flags)
         assert rc > 0, rc

museghost commented 8 years ago

@djc Thank you very much. It works now. For all who wants to find the patch file, I also attached it.

socket.patch.zip

djc commented 8 years ago

Thanks for testing!

nanomsg / nnpy

protocol buffer's SerializeToString() makes UnicodeDecodeError in python 2.7.11 #15