Closed copyhold closed 2 years ago
Any success with fixing that? Still have that problem with Tarantool 2.2 and connector 0.6.6 even when I don't select binary data from a space containing it
It seems, we cannot just set encoding
to None
when creating a connection, because at least schema reload does not support this.
However we can temporary set encoding to None
:
diff --git a/unit/suites/test_dml.py b/unit/suites/test_dml.py
index 31e821e..1264180 100644
--- a/unit/suites/test_dml.py
+++ b/unit/suites/test_dml.py
@@ -70,6 +70,12 @@ class TestSuite_Request(unittest.TestCase):
self.con.insert('space_1', [i, i%5, 'tuple_'+str(i)])[0],
[i, i%5, 'tuple_'+str(i)]
)
+ saved_encoding = self.con.encoding
+ self.con.encoding = None
+ t = [1001, 1, b'\xff']
+ self.assertEqual(self.con.insert('space_1', t)[0], t)
+ self.con.encoding = saved_encoding
+
def test_00_03_answer_repr(self):
repr_str = """- [1, 1, 'tuple_1']"""
self.assertEqual(repr(self.con.select('space_1', 1)), repr_str)
@@ -122,6 +128,12 @@ class TestSuite_Request(unittest.TestCase):
[[200, 0, 'tuple_200'], [205, 0, 'tuple_205']]
)
+ saved_encoding = self.con.encoding
+ self.con.encoding = None
+ res = self.con.select('space_1', 1001)
+ self.assertSequenceEqual(res, [[1001, 1, b'\xff']])
+ self.con.encoding = saved_encoding
+
def test_03_delete(self):
# Check that delete works fine
self.assertSequenceEqual(self.con.delete('space_1', 20), [[20, 0, 'tuple_20']])
NB: File another issue for Connection(host, port, encoding=None)
support.
Now I'm getting this:
>> print(result)
- None
- b'Type mismatch: can not convert 'REDACTED BINARY STRING' to varbinary'
And honestly I don't understand what encodings have to do when it comes to raw binary data. Like shouldn't it be just directly converted to bytes
Python type?
Okay, two different problems are discussed here:
bytes
), but a bytes
value is encoded as mp_str. Maybe we can change it on Python 3 (RFC patch).Ah, sorry for confusion, and thank you. Will wait for a complete fix
Any success?
We're at planning of Q3 tasks and I hope this one will be included here. I'll specifically mark it as important.
NB: Propose to msgpack-python to provide helpers like msgpack.as_bin() and msgpack.as_str() to encode a particular structure (dict, list, tuple) member as mp_bin or mp_str disregarding use_bin_type
option. It may be requirement of an msgpack-based protocol to have a string here and a binary value there (exactly our case).
Usage example:
import msgpack
msgpack.dumps(dict(foo='string', bar=msgpack.as_bin('binary')))
I think regarding this once again and it seems that encoding of bytes
as mp_bin
would be good idea in context of Python 3. We should not break compatibility, though. Changing defaults is nightmare. Let's give a meaningful error message instead, that would direct a user to use appropriate option.
NB: Think regarding a patch of this kind:
diff --git a/tarantool/request.py b/tarantool/request.py
index d1a5a82..c54a47b 100644
--- a/tarantool/request.py
+++ b/tarantool/request.py
@@ -79,26 +79,11 @@ class Request(object):
packer_kwargs = dict()
- # use_bin_type=True is default since msgpack-1.0.0.
- #
- # The option controls whether to pack binary (non-unicode)
- # string values as mp_bin or as mp_str.
- #
- # The default behaviour of the connector is to pack both
- # bytes and Unicode strings as mp_str.
- #
- # msgpack-0.5.0 (and only this version) warns when the
- # option is unset:
- #
- # | FutureWarning: use_bin_type option is not specified.
- # | Default value of the option will be changed in future
- # | version.
- #
- # The option is supported since msgpack-0.4.0, so we can
- # just always set it for all msgpack versions to get rid
- # of the warning on msgpack-0.5.0 and to keep our
- # behaviour on msgpack-1.0.0.
- packer_kwargs['use_bin_type'] = False
+ # XXX: Write an explanation here.
+ # XXX: Pass it from a connection constructor args, but keep
+ # use_bin_type=False behaviour for Python 2 (because string
+ # literals are bytes).
+ packer_kwargs['use_bin_type'] = True
self.packer = msgpack.Packer(**packer_kwargs)
There are no need to send extra arguments (as raw = True
) into msgpack.
data = ['a', b'b']
assert msgpack.unpackb(msgpack.packb(data)) == data
Major diff at 1.0.0 version is encoding option is removed. UTF-8 is used always.
That is why it is better not to pass any arguments into Decoder if encoding is 'utf-8' and version of msgpack greater than 1.0.0
Hello, any progress on this issue?
I would suggest to bump msgpack to 1.0.0 and to remove that use_bin_type=False
line. Otherwise we have no chance to store binary data.
Current behavior for tarantool-python 0.7.1:
@>>> c.insert("some", ("foo", pickle.dumps(42)))
Traceback (most recent call last):
…
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
@>>> c.delete("some", "foo")
Traceback (most recent call last):
…
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
@>>> c.delete("some", "foo") # already deleted
@>>>
The fix makes everything working:
@>>> c.insert("some", ("foo", pickle.dumps(42)))
- ['foo', b'\x80\x04K*.']
@>>> c.delete("some", "foo")
- ['foo', b'\x80\x04K*.']
Sorry, no progress yet. I need to dive into the topic again.
After the posting that PR, I realised that it is incomplete. I found several checks for different msgpack version in the tarantool-python
code. I suggest two solutions:
use_bin_type=False
only if msgpack is older than 1.0.0, just only one condition line.I'm ready to send either PR, just tell me what you prefer :)
Yesterday I tried to use Tarantool in my company's projects and right now I have to use a private fork of python-tarantool
with my fix applied. Our projects need to store pickled python objects in Tarantool.
I hope that solution will emerge very soon.
Hi,
My data in tarantool stored as binary strings, they are not UTF in any manner. It's just a sequence of bytes. When trying to select data with this module , unpacker fails with error:
Can this be resolved somehow?