sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 480 forks source link

Fix backwards incompatibility of unpickling in Python 3 #28444

Closed simon-king-jena closed 5 years ago

simon-king-jena commented 5 years ago

EDIT: In the original ticket description, I stated: "I believe that a backwards incompatible change of pickling is a blocker for Python-3 support." In that (and ONLY in that) sense I believe this ticket is a blocker. I replaced the original ticket description by something that I wrote in a comment, because now I have a much smaller example, and moreover pickles of the same object created with Python-3 and with Python-2, so that one can compare.

The following examples require the optional meataxe package, but I am not sure yet if meataxe is to blame or Python-3 (I hope it is the former, because I guess it would be more easy to fix).

attachment: Py2.sobj​ and attachment: Py3.sobj​ result in the following behaviour in Python-3

sage: load('/home/king/Projekte/coho/tests/Py2.sobj')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-3-5705b555470a> in <module>()
----> 1 load('/home/king/Projekte/coho/tests/Py2.sobj')

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2824)()
    149 
    150     ## Load file by absolute filename
--> 151     with open(filename, 'rb') as fobj:
    152         X = loads(fobj.read(), compress=compress)
    153     try:

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2774)()
    150     ## Load file by absolute filename
    151     with open(filename, 'rb') as fobj:
--> 152         X = loads(fobj.read(), compress=compress)
    153     try:
    154         X._default_filename = os.path.abspath(filename)

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.loads (build/cythonized/sage/misc/persist.c:7270)()
    967 
    968     unpickler = SageUnpickler(io.BytesIO(s))
--> 969     return unpickler.load()
    970 
    971 

UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
sage: load('/home/king/Projekte/coho/tests/Py3.sobj')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]

and in Python-2

sage: load('/home/king/Projekte/coho/tests/Py2.sobj')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: load('/home/king/Projekte/coho/tests/Py3.sobj')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: __ == _
True

So, the Python-3 pickle can be unpickled in Python-2, but not the other way around. What is the problem?

Component: python3

Keywords: unpickling UnicodeError backwards compatibility

Author: Simon King

Branch: d7f170f

Reviewer: Nils Bruin

Issue created by migration from https://trac.sagemath.org/ticket/28444

jhpalmieri commented 5 years ago

Changed commit from ba41ebe to d7f170f

jhpalmieri commented 5 years ago
comment:83

Here are two more minor fixes.


New commits:

a293427Pass unpickling options to pickle.load, default encoding 'latin1'. Accept both str and bytes in mtx_unpickle
5973292Make str_to_bytes/bytes_to_str accept both str and bytes input.
9646948Add tests for #28444
bfd64bcFix keyword for py3-only test
7e1de27Fix doc strings in sage.misc.persist
f4581b6Add a comment regarding the expected data type for an unpickle helper
c4db899Fix two typos in a comment
d7f170ftrac 28444: fix a few typos.
vbraun commented 5 years ago
comment:84

looks good to me...

nbruin commented 5 years ago

Reviewer: Nils Bruin

nbruin commented 5 years ago
comment:85

Let's get this out of the door then

vbraun commented 5 years ago

Changed branch from u/jhpalmieri/fix_backwards_incompatibility_of_unpickling_in_python_3 to d7f170f

soehms commented 5 years ago

Changed commit from d7f170f to none

soehms commented 5 years ago
comment:87

Replying to @simon-king-jena:

Simon, maybe it's the right time to change the format of the saved data?

As I have demonstrated above, a pickle created with Python-3 can be read both with Python-2 and Python-3. So, that side of the problem isn't really urgent for the p_group_cohomology package, I think.

Hi Simon,

recently I've experimented with data storage using YAML. I also used your examples for that and documented this in an jupyter notebook attached to #28302. It would be interesting to hear your comment about that.

I tested with Python 2 and 3 on 8.9.rc0 and 8.9.rc1. As said in the description of the ticket I could read Py3.sobj with Python 2 and 8.9.rc0. This seems to be broken now with 8.9.rc1 (this also happens with newly saved sobj-file from Python 3):

sage: load('Py3.sobj')
Traceback (most recent call last):
...
TypeError: expected bytes, unicode found

Should this be acceptable, right now?

simon-king-jena commented 5 years ago
comment:88

Replying to @soehms:

I tested with Python 2 and 3 on 8.9.rc0 and 8.9.rc1. As said in the description of the ticket I could read Py3.sobj with Python 2 and 8.9.rc0. This seems to be broken now with 8.9.rc1 (this also happens with newly saved sobj-file from Python 3):

sage: load('Py3.sobj')
Traceback (most recent call last):
...
TypeError: expected bytes, unicode found

How did this suddenly pop up?? Sigh.

Should this be acceptable, right now?

Personally I don't think so. So, I'd appreciate opening a new ticket.