Closed tapdo closed 4 years ago
Hello @tapdo, what's a motivation for this change? Also, need to add a test for this case.
Needs a clear motivating message with a description of the problem and solution.
Please add a unit test that fails before this change and succeeds after this change.
The problem is "UnicodeDecodeError" occurred on python3 when reading database values include multi-byte character which are stored on python 2 environment. Because of the difference between python 3 & python 2, load function in pickle failed in this situation. The solution is additional argument of load function "encoding='bytes'" to avoid decoding value as string. Sorry I don't know how to create a test for python version issue. Only I confirmed good working in my environment both python 2 and 3.
@tapdo can you provide at least small example (how to reproduce this error)?
unicode_fail_sample.zip My small sample. Please run on python 3 to reproduce error. Once you remove "my_db.sqlite", after that you can run a script on python 2 to create "my_db.sqlite" include this issue. I confirmed on python 2.7.13 and 3.6.1 , just for the record.
I checked it on 3.5
and 3.6
- all works correctly (with provided DB), for 2.7
this doesn't work (because in 2.7
max pickle protocol is 2 (but used 4))
If I re-create DB with 2.7
and after try to use it in 3.*
- I see error like
Traceback (most recent call last):
File "unicode_fail_sample.py", line 11, in <module>
print("Load:" + mydict['tapdo'].decode('utf-8'))
File "/home/ivan/.virtualenvs/abc3.5/lib/python3.5/site-packages/sqlitedict.py", line 246, in __getitem__
return self.decode(item[0])
File "/home/ivan/.virtualenvs/abc3.5/lib/python3.5/site-packages/sqlitedict.py", line 105, in decode
return loads(bytes(obj))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
@menshikh-iv Sorry for my mistake. The last part you show is my pointing it out. I attached the correct sample. unicode_fail_sample2.zip
Code:
from sqlitedict import SqliteDict
mydict = SqliteDict('./my_db.sqlite', autocommit=True)
if 'tapdo' not in mydict:
mydict['tapdo'] = u"○".encode('utf-8')
else:
print("Load:" + mydict['tapdo'].decode('utf-8'))
mydict.close()
Problems:
2.7
and trying to load it with 3.*
, we have the encoding problem
Traceback (most recent call last):
File "unicode_fail_sample.py", line 11, in <module>
print("Load:" + mydict['tapdo'].decode('utf-8'))
File "/home/ivan/.virtualenvs/abc3/lib/python3.6/site-packages/sqlitedict.py", line 246, in __getitem__
return self.decode(item[0])
File "/home/ivan/.virtualenvs/abc3/lib/python3.6/site-packages/sqlitedict.py", line 105, in decode
return loads(bytes(obj))
3.*
and trying to load with 2.7
, we have problems with pickle protocol (expected behavior)
Traceback (most recent call last):
File "unicode_fail_sample.py", line 11, in <module>
print("Load:" + mydict['tapdo'].decode('utf-8'))
File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/sqlitedict.py", line 246, in __getitem__
return self.decode(item[0])
File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/sqlitedict.py", line 105, in decode
return loads(bytes(obj))
ValueError: unsupported pickle protocol: 4
@tapdo first mentioned problem is your case, am I right?
@menshikh-iv Yes, you right.
@tapdo OK, thanks. @piskvorky what's should be expected behavior for this case (this is a bug or not)?
I don't remember if sqlitedict
committed to cross-Python compatibility officially, so I don't know if it's a bug.
But to me compatibility in both directions sounds like a good idea (we don't need any new pickle protocols). Any reason why that shouldn't work?
@piskvorky About pickle: we can pin 2 pickle version, but we should be very careful with it (pin for all new DB, for old - check the version and use it instead of 2).
About current encoding - I'm worried about testing this change (looks like we can easily break something) + probably, current PR will decrease performance significantly (because of loads
-> rase/catch -> loads
), need to benchmark it too.
I think pickle can recognize its version automatically during load, no action needed. Or what do you mean?
I agree that try/catch is not a good solution for checking API versions. We want to make the compatibility uniform (no exceptions) and explicit.
@piskvorky pickle can recognize automatically, but we shouldn't pin version for old DB where version != 2.
I still don't understand -- pin what, where, why?
@piskvorky pin pickle protocol version to 2 for cross-compatibility between py2
and py3
AFAIK pickle recognizes its protocol automatically on load. What actions on existing DBs do you think are needed?
@piskvorky I'm talking about saving now (not loading). For existing DB - at least extract pickle version and use same (for saving).
Looks like I'm late to the party here. How should we proceed with this PR?
If we want to pursue it, then we need to:
This requires additional effort. Personally, my recommendation is to abandon the PR, and explicitly state that we do not support cross-compatibility. My motivation is that Py2 will be obsolete soon, and we can direct our efforts in more productive directions elsewhere.
Closing due to inactivity.
… created by Python 2