Closed simon-king-jena closed 5 years ago
PS: Another approach was mentioned, namely to patch the pickle
module, so that it is guessed whether a pickled Py-2 str is unpickled as Py-3 str or bytes. But each heuristics to do so can fail on specific data, and in such cases we would still need to add appropriate conversion from str to bytes and from bytes to str to Sage library functions, to catch the cases in which the heuristics fails.
Here is another aspect that we could consider:
sage: all_bytes = bytes(range(0x100))
sage: len(dumps(all_bytes))
410
sage: len(dumps(all_bytes.decode(encoding='latin')))
361
So, it might actually be a good idea to NOT use bytes for pickles, but use str instead, when Python 3 is involved (I am slightly surprised, because I thought bytes objects are simpler than str objects and thus their pickle should be smaller, not larger).
When we change pickle functions to switch from using bytes to using str, it would be the corresponding unpickling functions that should be changed to accept str instead of bytes to make Py-2 pickles useable in Py-3 with encoding='latin1'
. That's another reason to choose the default encoding='latin1'
.
Replying to @simon-king-jena:
encoding='latin1'
will make it so that always the unpickler will be in a position to look up globals. However, each Sage function expecting abytes
needs to be changed so that it also acceptsstr
and transforms it tobytes
with encoding "latin1".
We only need to do so at the time of unpickling, for backwards compatibility reasons (i.e., if we move to a python-3 only protocol in the future, the workaround can be turned off when unpickling at that protocol level). So instead of actually changing the (hopefully few) objects that have a "bytes" object in them to accept strings and interpret them with latin1 encoding, can we just register little wrappers in "copyreg" to do that decoding for us?
However, we also need to worry about non-Sage library objects that might occur as attributes of Sage library objects. Such as numpy arrays/datetime. Is there a hack to make unpickling of numpy arrays work even when latin1 is used? Or is bytes the only way out for them?
Backport Issue 22005 from Python 3.9?
If the copyreg option is available, perhaps the following would be a workable, extendable solutions:
encoding
is available for sage's unpickler accessI think that gets us at least to a state where the "blocker" status of this ticket can be lifted, because there's a clear path to resolving this pickle incompatibility.
Truth is: any data type that is not in a pickle jar that gets tested regularly will almost certainly not be supported for unpickling in the future. Backwards compatibility of pickle requires real and constant work as sage develops. Major overhauls particularly will be a huge problem, and in some cases (if we don't have a version attribute) there might already be pickles that are genuinely ambiguous in how they should be unpickled.
Can we do something like the following? Suppose we have an object as below. It's written without particular considerations for pickling and it uses both bytes and str, so on the Py3 side, we're hooped:
class MixedObject(object):
def __init__(self,B,S):
if not(isinstance(B,bytes)):
raise TypeError("Parameter B must be of type bytes")
if not(isinstance(S,str)):
raise TypeError("Parameter S must be ot type str")
self.B=B
self.S=S
def validate(self):
if not(isinstance(self.B,bytes)):
raise TypeError("Parameter B must be of type bytes")
if not(isinstance(self.S,str)):
raise TypeError("Parameter S must be ot type str")
return True
X=MixedObject(b"hi","there")
The pickle we get with pickle.dumps(X)
from Py2 unpacks into Py3 into an object that doesn't validate. If we use pickle.loads(X,encoding="bytes")
we end up with an object that doesn't even have the required attributes (because the dictionary gets reconstructed with bytes
keys).
However, if we equip the object on the Py3 side with a __setstate__
:
def __setstate__(self,D):
if isinstance(D['B'],str):
self.B=D['B'].encode('latin1')
else:
self.B=D['B']
self.S=D['S']
we can use loads(...,encoding="latin1")
and get an object that validates. So with a bit of luck, I think we can deal with most cases where we do need a bytes
attribute with either use copyreg
to register a custom reconstructor (when an object uses a __reduce__
) that can preprocess construction parameters that are to be turned into bytes
or by amending/adding __setstate__
to use encode('latin1')
on values that are supposed to be bytes
.
It's a fairly painful process, but it should be doable entirely on the py3 side.
Replying to @nbruin:
Replying to @simon-king-jena:
encoding='latin1'
will make it so that always the unpickler will be in a position to look up globals. However, each Sage function expecting abytes
needs to be changed so that it also acceptsstr
and transforms it tobytes
with encoding "latin1".We only need to do so at the time of unpickling, for backwards compatibility reasons (i.e., if we move to a python-3 only protocol in the future, the workaround can be turned off when unpickling at that protocol level). So instead of actually changing the (hopefully few) objects that have a "bytes" object in them to accept strings and interpret them with latin1 encoding, can we just register little wrappers in "copyreg" to do that decoding for us?
I don't know about copyreg. What and how can it do?
That said: We currently know only one type (namely Matrix_gfpn_dense
) in the Sage library whose unpickling would fail if stored Python-2 str are interpreted as Python-3 str. And since a pickle with bytes is larger than a pickle with str, I'm inclined to change the pickling of Matrix_gfpn_dense
anyway.
Suggestion: In a new ticket that will be a dependency for here, I'll change Matrix_gfpn_dense
so that it will in future use str for pickling and will be able to use both bytes and str for unpickling- Does that make sense to you?
Afterwards, we can focus here on encoding defaults, copyreg...
Backport Issue 22005 from Python 3.9?
... and the backport.
Sorry, I just notice that my above suggestions largely coincide with your suggestion in the rest of your post. So:
If the copyreg option is available, perhaps the following would be a workable, extendable solutions:
- make sure
encoding
is available for sage's unpickler access
That's for the ticket here, and easy.
- aim for "latin1" being the default, but let 'bytes' be an option for the (rare?) pickles that need it
We should give the user the option to use options, regardless whether such options are needed for pickles of Sage library objects or not. That's easy to do.
- solve a few cases needing 'bytes', using copyreg registered constructors, so that they also work with 'latin1'
I would like to solve one case in a new ticket (namely Matrix_gfpn_dense
) without copyreg. If further cases pop up (perhaps in the pickle jar?), we may tackle them here with copyreg. I don't know how difficult it is.
- document the process of writing these copyreg constructors so that it's relatively straightforward to do it when we discover other data types that need it.
Good idea.
I think that gets us at least to a state where the "blocker" status of this ticket can be lifted, because there's a clear path to resolving this pickle incompatibility.
I think the example from the ticket description could be easily fixed.
Truth is: any data type that is not in a pickle jar that gets tested regularly will almost certainly not be supported for unpickling in the future.
If meataxe was a standard package (there are reasons to make it standard, btw.), Matrix_gfpn_dense
should certainly become part of the pickle jar.
Replying to @simon-king-jena:
I don't know about copyreg. What and how can it do?
From what I understand, it manages the dictionary in which "constructors" are looked up. A pickle of an object with a __reduce__
would usually amount to instructions to call a certain named constructor with arguments (encoded in the pickle as well). The name of that constructor needs to be looked up somewhere. If I'm not mistaken, copyreg
manages the dictionary for that. One can also instantiate Unpickler
objects (or subclass them) to customize the constructor lookup. In cases where the arguments as pickled would need preprocessing before calling the normal constructor (just class names are generally used as constructors), then it makes sense to register a factory function in place of that class name to do the preprocessing (and then call the relevant class).
EDIT: I see now that register_unpickle_override
seems to hook into the same place where copyreg
for the normal python unpickler goes (sage subclasses the normal unpickler and exposes that via load and loads)
Suggestion: In a new ticket that will be a dependency for here, I'll change
Matrix_gfpn_dense
so that it will in future use str for pickling and will be able to use both bytes and str for unpickling- Does that make sense to you?
I'm rather strongly against misusing str
for binary data. That's not what it's for! And what you can see in this ticket already, is that signalling intent properly in pickle files is of utmost importance for forward/backward compatibility option. The observation that it ends up shorter as a str
could be something stupid as the sequence having a UTF-8 representation that happens to be surprisingly short. If you think the pickle format for binary can be improved, then patch pickle. I think Matrix_gfpn_dense
should just have the interface that makes most sense for it. It can also have a custom __reduce__
that encodes its defining data in a sensible way (it probably already has); possible zipping or otherwise compressing the data if that is appropriate (I don't think it is -- compression can always be done in a separate round)
Matrix_gfpn_dense
is very easy to fix: they are ALREADY unpickled using a helper function mtx_unpickle
. If that would accept as Data both bytes
and str
(which it can then transform in bytes
using the latin1
encoder), you'd be done. That doesn't affect the normal API of the class at all.
If meataxe was a standard package (there are reasons to make it standard, btw.),
Matrix_gfpn_dense
should certainly become part of the pickle jar.
Isn't the problem that the pickle jar isn't really maintained/tested anymore? With the corollary that pickling in sage is doomed to suffer from compatibility problems.
With the new commit, one can unpickle a previously un-unpicklable meataxe matrix:
sage: X = load('/home/king/Projekte/coho/tests/Py2.sobj')
sage: X
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
However, we need doctests to demonstrate that this ticket fixes what it was supposed to fix.
Also, I wonder if we should strive to make it not only "legal" to use encoding='bytes'
but to actually make it possible: Currently, one gets
sage: X = load('/home/king/Projekte/coho/tests/Py2.sobj', encoding='bytes')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-1bf7b48d4b86> in <module>()
----> 1 X = load('/home/king/Projekte/coho/tests/Py2.sobj', encoding='bytes')
/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2863)()
154
155 ## Load file by absolute filename
--> 156 with open(filename, 'rb') as fobj:
157 X = loads(fobj.read(), compress=compress, **kwargs)
158 try:
/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2813)()
155 ## Load file by absolute filename
156 with open(filename, 'rb') as fobj:
--> 157 X = loads(fobj.read(), compress=compress, **kwargs)
158 try:
159 X._default_filename = os.path.abspath(filename)
/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.loads (build/cythonized/sage/misc/persist.c:7318)()
974
975 unpickler = SageUnpickler(io.BytesIO(s), **kwargs)
--> 976 return unpickler.load()
977
978
/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/structure/factory.pyx in sage.structure.factory.lookup_global (build/cythonized/sage/structure/factory.c:4602)()
752 pass
753
--> 754 if '.' in name:
755 module, name = name.rsplit('.', 1)
756 all = __import__(module, fromlist=[name])
TypeError: a bytes-like object is required, not 'str'
We could of course make lookup_global
to accept bytes (with encoding ascii, I guess) to look up functions given by names. It will, however, be very likely that that will not be the only change needed to make encoding='bytes'
work.
New commits:
2e0d11e | Pass unpickling options to pickle.load, default encoding 'latin1'. Accept both str and bytes in mtx_unpickle |
Commit: 2e0d11e
Warning: Trying to make bytes encoding work is likely to open a can of worms. Such as:
sage: X = load('/home/king/Projekte/coho/tests/Py2.sobj', encoding='bytes')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-1bf7b48d4b86> in <module>()
----> 1 X = load('/home/king/Projekte/coho/tests/Py2.sobj', encoding='bytes')
/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2863)()
157
158 ## Load file by absolute filename
--> 159 with open(filename, 'rb') as fobj:
160 X = loads(fobj.read(), compress=compress, **kwargs)
161 try:
/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2813)()
158 ## Load file by absolute filename
159 with open(filename, 'rb') as fobj:
--> 160 X = loads(fobj.read(), compress=compress, **kwargs)
161 try:
162 X._default_filename = os.path.abspath(filename)
/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.loads (build/cythonized/sage/misc/persist.c:7318)()
977
978 unpickler = SageUnpickler(io.BytesIO(s), **kwargs)
--> 979 return unpickler.load()
980
981
/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/rings/integer.pyx in sage.rings.integer.make_integer (build/cythonized/sage/rings/integer.c:43277)()
7220 """
7221 cdef Integer r = PY_NEW(Integer)
-> 7222 mpz_set_str(r.value, str_to_bytes(s), 32)
7223 return r
7224
/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/cpython/string.pxd in sage.cpython.string.str_to_bytes (build/cythonized/sage/rings/integer.c:47648)()
90 # compile-time variable. We keep the Cython wrapper to deal with
91 # the default arguments.
---> 92 return _str_to_bytes(s, encoding, errors)
TypeError: expected str, bytes found
So, it would mean to change further constructors to be flexible with regards to str and bytes.
Thus, question: Shall we try to do this? Shall we just keep the new commit, and add doctests? Further suggestion for the way to proceed?
I just notice that the function str_to_bytes
does not require that the input is a str
. Hence, it may be reasonable to change it so that, if the input is a bytes, it is returned without change.
Branch pushed to git repo; I updated commit sha1. New commits:
1984bc5 | Make str_to_bytes accept str input, and bytes_to_str accept bytes input. Use it in finite field and integer constructor |
As a proof of concept, I provide a commit which makes it possible to unpickle my example with encoding='bytes'
. It works by making bytes_to_str
accept str input (returning it unchanged) and str_to_bytes
accept bytes input (returning it unchanged. Also, I used these conversion functions in some integer constructor, finite field constructor, and in mtx_unpickle
.
However, I guess it would be needed to use bytes_to_str
and str_to_bytes
in a lot more places, therefore it is not more than a proof of concept.
Anyway, with the new commit, the following things work in Python 3:
sage: load('/home/king/Projekte/coho/tests/Py2.sobj')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: load('/home/king/Projekte/coho/tests/Py2.sobj', encoding='bytes')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: load('/home/king/Projekte/coho/tests/Py3.sobj', encoding='bytes')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: from sage.matrix.matrix_gfpn_dense import mtx_unpickle
sage: class unpickle_old_mtx:
....: def __call__(self, *args, **kwds):
....: return mtx_unpickle(*args, **kwds)
....:
sage: register_unpickle_override('pGroupCohomology.mtx', 'MTX_unpickle_class', unpickle_old_mtx)
sage: X = load('/home/king/Projekte/coho/tests/State.sobj')
Of course, all these examples are using the optional meataxe package.
Why try to make encoding='bytes'
work in so many places? The problem is that pickles travelling over the Py2/Py3 boundary have an issue that the str/bytes distinction gets squashed (because str in py2 wasn't proper unicode yet). It's pretty clear that str objects are WAY more common in python than bytes objects, so encoding with bytes
is going to need a LOT of post-processing. It's much more reasonable to assume these objects are in fact (ascii) strings. However, that breaks hard if it obviously isn't. Hence the latin1
workaround. But then getting bytes objects where they should be of course requires domain-specific knowledge. So the unpickling routines must be made more tolerant and accept a str where it expects a bytes object. But preferably only the unpickling routines! Interpreting a str object as a bytes object via latin1 encoding remains a hack that could mask errors and typos, and make them harder to detect.
If you want to make things work with 'bytes' encoding as well as 'latin1' encoding, I think you're just making extra work for yourself, and if you're making the infrastructure more lax about input types, I think you can end up with masking more errors in the end. Note that the problem you're solving here is very limited: it's JUST to help pickles cross the Py2-to-Py3 boundary (the other direction is not supported at all).
With the experience we have here now, I think it's pretty clear that with enough work, you can make any pickle decode with latin1 in Py3. You'll just be registering a lot of pickle overrides ...
Replying to @nbruin:
Why try to make
encoding='bytes'
work in so many places?
Because an important datatype, namely numpy arrays with datetime objects, apparently can only be unpickled using encoding='bytes'
. And of course, a numpy array can be part of a larger structure that also comprises further Sage objects. Therefore it makes sense that these can be unpickled with encoding='bytes'
, too.
So the unpickling routines must be made more tolerant and accept a str where it expects a bytes object. But preferably only the unpickling routines!
Sure. That's what I do in the first commit (in the only place in the Sage library that really uses bytes that I am aware of).
Interpreting a str object as a bytes object via latin1 encoding remains a hack that could mask errors and typos, and make them harder to detect.
You mean the other way around.
If you want to make things work with 'bytes' encoding as well as 'latin1' encoding, I think you're just making extra work for yourself, and if you're making the infrastructure more lax about input types, I think you can end up with masking more errors in the end.
Sure. That's why I said it is a proof of concept. I show that making bytes_to_str
and str_to_bytes
more tolerant about input and using it more commonly in fact is a way to unpickle with encoding='bytes'
, showing that with some work there is a chance to unpickle complex data structures involving numpy arrays, without patching numpy.
But probably, it would be easier to just drop the new commit, and then provide some unpickle override for numpy arrays only.
Note that the problem you're solving here is very limited: it's JUST to help pickles cross the Py2-to-Py3 boundary (the other direction is not supported at all).
The other direction apparently is less critical, as I have shown that the Py-3 pickle corresponding to the ticket description can be unpickled both in Py-2 and Py-3.
With the experience we have here now, I think it's pretty clear that with enough work, you can make any pickle decode with latin1 in Py3. You'll just be registering a lot of pickle overrides ...
No. I am pretty sure that with latin1 it would be few pickle overrides. With bytes it would be lots of pickle overrides.
Replying to @simon-king-jena:
With the experience we have here now, I think it's pretty clear that with enough work, you can make any pickle decode with latin1 in Py3. You'll just be registering a lot of pickle overrides ...
No. I am pretty sure that with latin1 it would be few pickle overrides. With bytes it would be lots of pickle overrides.
Perhaps I was confusing "pickle overrides" with "changes in unpickling functions".
Restating what I mean:
numpy.array
. If 'bytes' is default encoding, I simply don't know how many pickle overrides we'd need to register for non-Sage data types. If I recall correctly, the links discussed above only mention numpy datetime, with the corollary of numpy arrays. But is there more?From the above, I think we can conclude that 'latin1' is the most reasonable default encoding for unpickling. But we should decide to what extent we want to support the 'bytes' encoding for unpickling. From my perspective, it wouldn't really be needed, thus, I'm totally fine with discarding the proof-of-concept commit.
Even if we decide to not support 'bytes' encoding, we might want to change str_to_bytes
and bytes_to_str
so that the former accepts bytes input and the latter accepts str input, returning the input unaltered in that case. I think it would be useful.
Replying to @simon-king-jena:
Even if we decide to not support 'bytes' encoding, we might want to change
str_to_bytes
andbytes_to_str
so that the former accepts bytes input and the latter accepts str input, returning the input unaltered in that case. I think it would be useful.
Yes, that doesn't look too bad. Do we know for certain that datetime is still a problem with latin1? I don't have a problem with it when I do
pickle.dumps(datetime(1977,10,10))
and then load it in Py3 with pickle.loads(...,encoding="latin1")
. With encoding="bytes"
it also seems to work. When I look in Python3 Lib/datetime.py, I see reduce and __new__
methods that very explicitly have comments about "pickle support" and calls to latin1
encoding to deal with string input parameters. So I'm pretty sure this has been fixed in recent python version; not just 3.9.
So I don't think we have a convincing case where encoding=bytes is required, and we do have convincing evidence that encoding=bytes is very painful. Even in a py3 where latin1 does not work out of the box, one could provide a pickle override to do the preprocessing. Therefore, I don't think you should put extra effort in supporting encoding=bytes at the moment.
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
8c105cd | Make str_to_bytes/bytes_to_str accept both str and bytes input. |
I replaced the old second commit by a commit that is less invasive: It changes the conversion functions so that they return the input unaltered if its type fits, it uses the conversion functions in matrix_gfpn_dense
for unpickling (I think that's cleaner than doing a type check there), and it also uses them in lookup_global
, because if we want to support encoding='bytes'
at all, we have to make lookup work with that option.
So, from my perspective, the code itself is ready to be used. But we should add some tests. Ideas how they could look? I could imagine to use a string that was created by dumps
in Python-2 and demonstrate that the string can be interpreted by loads
, regardless whether it is Py-2 or Py-3.
Currently I try to create a doctest as follows: Have a Cython class Foo such that its constructor expects a str. Create a Foo instance f in Python 2 with a non-ascii string, and create a pickle string s with dumps(f)
. Then, without this ticket, loads(s)
would fail in Python 3, because the non-ascii string in the pickle cannot be unpickled as a string with the default encoding chosen by Python 3. But WITH this ticket, it can be unpickled, because our default encoding is latin1.
Problem: When I do
sage: cython("""
....: from sage.structure.richcmp import richcmp
....: cdef class Foo:
....: cdef str bar
....: def __init__(self, data):
....: self.bar = data
....: def __richcmp__(self, other, op):
....: cdef Foo Other = other
....: return richcmp(self.bar, Other.bar, op)
....: def __reduce__(self):
....: return Foo, (self.bar,)
....: """)
then the module name of Foo
is a temporary name that is essentially random. How can I prescribe that the module of Foo
is __main__
?
Branch pushed to git repo; I updated commit sha1. New commits:
712e8dd | Add tests for #28444 |
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
c8a0748 | Add tests for #28444 |
The new commit adds tests. There is also one test that should only be executed in Python-3, and therefore I marked it # optional: python3
. The tests in sage.misc.persist
do pass in Sage-with-py3 and Sage-with-py2. However, oddly enough, the Py3 version of Sage adds python2
, not python3
, to the doctester. Is that a bug?
Anyway, I now think the ticket can be reviewed.
Author: Simon King
Replying to @simon-king-jena:
However, oddly enough, the Py3 version of Sage adds
python2
, notpython3
, to the doctester.
I just tested: With sage -t --optional=build,ccache,dochtml,frobby,gap_packages,gdb,gfortran,libsemigroups,meataxe,memlimit,mpir,p_group_cohomology,python2,python3,sage src/sage/misc/persist.pyx
, the optional py3-test passes in Sage-with-py3, but fails in Sage-with-py2, which is of course expected, as the encoding
keyword doesn't exist in Python-2.
The python3-only test tag is #py3
Branch pushed to git repo; I updated commit sha1. New commits:
89ac77a | Fix keyword for py3-only test |
Replying to @fchapoton:
The python3-only test tag is
#py3
Thank you! The latest commit fixes it. I verified that the tests of persist.pyx
pass both with python-2 and python-3.
Oops. The doc-builder complains:
[dochtml] [misc ] docstring of sage.misc.persist.SagePickler:21: WARNING: Inline literal start-string without end-string.
[dochtml] [misc ] docstring of sage.misc.persist.loads:52: WARNING: Block quote ends without a blank line; unexpected unindent.
[dochtml] [misc ] docstring of sage.misc.persist.loads:52: WARNING: Inline emphasis start-string without end-string.
Work Issues: Fix doc strings
Branch pushed to git repo; I updated commit sha1. New commits:
3693024 | Fix doc strings in sage.misc.persist |
Changed work issues from Fix doc strings to none
Fixed, I hope.
On OS X, all tests pass with both Python 2 and Python 3. I don't have any comments on the actual code.
Perhaps to improve maintainability for the future, document the uses in unpickling routines by explicitly mentioning something along the lines of
# Legacy pickles from Python 2 do not distinguish between str and bytes type objects.
# While we expect a "bytes" type object here, it could arrive here as a "str" object, for instance
# if unpickle was passed the argument "encoding='latin1'". In that case we convert the "str" to
# a "bytes" object using "encoding='latin1'". If the object is already a "bytes" object, then
# str_to_bytes will return it unchanged.
because the use of "str_to_bytes" would indicate for the uninitiated reader that a "str" object is expected, whereas the main code path is actually through "bytes". The "str" case only arises in legacy cases.
Branch pushed to git repo; I updated commit sha1. New commits:
f0828ee | Add a comment regarding the expected data type for an unpickle helper |
Replying to @nbruin:
Perhaps to improve maintainability for the future, document the uses in unpickling routines...
Is the new commit ok?
Replying to @simon-king-jena:
Replying to @nbruin:
Perhaps to improve maintainability for the future, document the uses in unpickling routines...
Is the new commit ok?
There are some language errors in the text, but I think the gist of it is clear.
Branch pushed to git repo; I updated commit sha1. New commits:
ba41ebe | Fix two typos in a comment |
Replying to @nbruin:
There are some language errors in the text, but I think the gist of it is clear.
Oops, you are right. I tried to fix it. Is it good now?
EDIT: In the original ticket description, I stated: "I believe that a backwards incompatible change of pickling is a blocker for Python-3 support." In that (and ONLY in that) sense I believe this ticket is a blocker. I replaced the original ticket description by something that I wrote in a comment, because now I have a much smaller example, and moreover pickles of the same object created with Python-3 and with Python-2, so that one can compare.
The following examples require the optional meataxe package, but I am not sure yet if meataxe is to blame or Python-3 (I hope it is the former, because I guess it would be more easy to fix).
attachment: Py2.sobj and attachment: Py3.sobj result in the following behaviour in Python-3
and in Python-2
So, the Python-3 pickle can be unpickled in Python-2, but not the other way around. What is the problem?
Component: python3
Keywords: unpickling UnicodeError backwards compatibility
Author: Simon King
Branch:
d7f170f
Reviewer: Nils Bruin
Issue created by migration from https://trac.sagemath.org/ticket/28444