Closed zentol closed 7 years ago
Yes, looks like a bug. Thanks for reporting.
This seems to happen for python3.x
, not just 3.4
… and apparently happens due to the use of super
(I'll need to investigate a bit further). If you replace the super
line with a pass, then obj2()
pickles fine with dill
.
Since pickle
serializes by reference, you can still access this behavior with dill
. So the workaround (until the bug is fixed) would be to have dill
serialize by reference.
>>> dill.dumps(obj2(), byref=True)
b'\x80\x03c__main__\nobj2\nq\x00)\x81q\x01.'
Here's a slightly pruned down version that still fails:
import dill
class obj:
def __init__(self):
super()
dill.dumps(obj())
Seems to be getting into a cycle of:
T2: <class '__main__.obj'>
D2: <dict object at 0x7f2862e14688>
F1: <function obj.__init__ at 0x7f2862e09bf8>
D1: <dict object at 0x7f2867d85388>
Ce: <cell at 0x7f2867d7c528: type object at 0x7f286a69eb78>
>>> dill.detect.trace(True)
>>>
>>> dill.pickles(obj2())
T2: <class '__main__.obj2'>
F2: <function _create_type at 0x10e471560>
T1: <class 'type'>
F2: <function _load_type at 0x10e471050>
T2: <class '__main__.obj'>
T1: <class 'object'>
D2: <dict object at 0x10e5a3758>
F1: <function obj.__init__ at 0x10da08680>
F2: <function _create_function at 0x10d68ec20>
Co: <code object __init__ at 0x10d626150, file "<stdin>", line 2>
F2: <function _unmarshal at 0x10e4715f0>
D1: <dict object at 0x10d6546c8>
D2: <dict object at 0x10da123f8>
D2: <dict object at 0x10e5a3098>
F1: <function obj2.__init__ at 0x10da087a0>
Co: <code object __init__ at 0x10d902660, file "<stdin>", line 2>
D1: <dict object at 0x10d6546c8>
Ce: <cell at 0x10d8c3b08: type object at 0x7f9949c33dd0>
F2: <function _create_cell at 0x10e4817a0>
T2: <class '__main__.obj2'>
D2: <dict object at 0x10e5a2560>
F1: <function obj2.__init__ at 0x10da087a0>
D1: <dict object at 0x10d6546c8>
Ce: <cell at 0x10d8c3b08: type object at 0x7f9949c33dd0>
T2: <class '__main__.obj2'>
D2: <dict object at 0x10e5a37e8>
F1: <function obj2.__init__ at 0x10da087a0>
D1: <dict object at 0x10d6546c8>
Ce: <cell at 0x10d8c3b08: type object at 0x7f9949c33dd0>
T2: <class '__main__.obj2'>
With the last block repeating until it hits the recursion error for python3.x
. In python2.x
, we have:
>>> dill.pickles(obj2)
T2: <class '__main__.obj2'>
F2: <function _create_type at 0x102618398>
T1: <type 'type'>
F2: <function _load_type at 0x102618320>
T2: <class '__main__.obj'>
T1: <type 'object'>
D2: <dict object at 0x10264c398>
F1: <function __init__ at 0x102677a28>
F2: <function _create_function at 0x102618410>
Co: <code object __init__ at 0x101a2e830, file "<stdin>", line 2>
F2: <function _unmarshal at 0x1026182a8>
D1: <dict object at 0x101774168>
D2: <dict object at 0x102645a28>
D2: <dict object at 0x10264c050>
F1: <function __init__ at 0x102677aa0>
Co: <code object __init__ at 0x1018520b0, file "<stdin>", line 2>
D1: <dict object at 0x101774168>
D2: <dict object at 0x1018115c8>
True
So looks like (in python3
) we have:
Co: <code object __init__ at 0x10d902660, file "<stdin>", line 2>
D1: <dict object at 0x10d6546c8>
Ce: <cell at 0x10d8c3b08: type object at 0x7f9949c33dd0>
instead of:
Co: <code object __init__ at 0x1018520b0, file "<stdin>", line 2>
D1: <dict object at 0x101774168>
D2: <dict object at 0x1018115c8>
Looks like this is it:
log.info("D1: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
if PY3:
pickler.write(bytes('c__builtin__\n__main__\n', 'UTF-8'))
Not __builtin__
in PY3.
Wouldn't that just cause unpickling to fail?
No, it's editing the pickled string there, if I remember correctly. This could cause issues elsewhere too, where only the old '__builtin__'
is looked for --
def find_class(self, module, name):
if (module, name) == ('__builtin__', '__main__'):
Of, course, this needs investigation.
Python2/3 compatible minimal test.
>>> class obj(object):
... def __init__(self):
... super(obj, self).__init__()
...
>>> dill.dumps(obj())
>>> class obj(object):
... def __init__(self):
... object.__init__(self)
...
>>> dill.dumps(obj())
b'\x80\x03cdill.dill\n_create_type\nq\x00(cdill.dill\n_load_type\nq\x01X\x04\x00\x00\x00typeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00objq\x05h\x01X\x06\x00\x00\x00objectq\x06\x85q\x07Rq\x08\x85q\t}q\n(X\r\x00\x00\x00__slotnames__q\x0b]q\x0cX\x08\x00\x00\x00__init__q\rcdill.dill\n_create_function\nq\x0e(cdill.dill\n_unmarshal\nq\x0fC\x8ac\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x11\x00\x00\x00t\x00\x00j\x01\x00|\x00\x00\x83\x01\x00\x01d\x00\x00S(\x01\x00\x00\x00N(\x02\x00\x00\x00u\x06\x00\x00\x00objectu\x08\x00\x00\x00__init__(\x01\x00\x00\x00u\x04\x00\x00\x00self(\x00\x00\x00\x00(\x00\x00\x00\x00u\x07\x00\x00\x00<stdin>u\x08\x00\x00\x00__init__\x02\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x10\x85q\x11Rq\x12c__builtin__\n__main__\nh\rNN}q\x13tq\x14Rq\x15X\x07\x00\x00\x00__doc__q\x16NX\n\x00\x00\x00__module__q\x17X\x08\x00\x00\x00__main__q\x18utq\x19Rq\x1a)\x81q\x1b.'
The trick I'm using there is declaring __builtin__
as the module in the pickle, and then this:
def find_class(self, module, name):
if (module, name) == ('__builtin__', '__main__'):
return self._main_module.__dict__
Gives us D2
. So that's not getting triggered in python3.
My comment further above may not be correct… (e.g. not __builtin__
).
'__builtin__.__main__'
is used as a class name in find_class
that should indicate we are pickling from the interpreter.
However, it might require 'builtins'
to be used so the module can be imported, even though it's not really used.
The cycle is obj.__init__.__closure__[0].cell_contents == obj
, but I'm not sure how its supposed to be broken.
Ack. Looks like it doesn't make a difference to use builtins
. Whiff.
Shouldn't need to "break" the cycle. Should avoid it ever getting to a CellType
, I think. That means why finding why it doesn't it go: Co, D1, D2
If you just have:
class obj:
def __init__(self): pass
obj.__init__.__closure__ is None
And the D1
and D2
bit are there to work with globals() in __main__
, so maybe super is a weird object in the lookup… and we need a special case for it… basically, don't try to look it up in globals?
>>> super.mro()
[<class 'super'>, <class 'object'>]
>>> super(object)
<super: <class 'object'>, NULL>
This also fails with the same error:
>>> class obj:
... def __init__(self):
... id(super), id(self), id(obj)
...
>>> dill.dumps(obj())
While this succeeds:
>>> class obj:
... def __init__(self):
... id(self), id(obj)
...
>>> dill.dumps(obj())
And this fails:
>>> class obj:
... def __init__(self):
... super
...
>>> dill.dumps(obj())
So, apparently it should the lookup of super
in the global dict that could be fixed.
It must have something to do with the new style super, which according to https://www.python.org/dev/peps/pep-0367/#reference-implementation, uses bytecode hacking.
This works:
>>> _super = super
>>> class obj(object):
... def __init__(self):
... _super(obj, self).__init__()
...
>>> dill.dumps(obj())
However, this fails with an interesting error:
>>> _super = super
>>> class obj(object):
... def __init__(self):
... _super()
...
>>> o = obj()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in __init__
RuntimeError: super(): __class__ cell not found
This too:
>>> class obj:
... _super = super
... def __init__(self):
... self._super(object, self).__init__()
...
>>> o = obj()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in __init__
RuntimeError: super(): __class__ cell not found
This too:
>>> class obj:
... _super = super
... def __init__(self):
... obj._super(object, self).__init__()
...
>>> o = obj()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in __init__
RuntimeError: super(): __class__ cell not found
To me this is smelling like a python bug.
So a workaround would be to "find" super
instead of go looking for it… otherwise, we'll have to understand what was done in the hack.
has a bug for this been opened with python?
@frnsys: No. Not that I know of, but I also didn't check too extensively either.
@mmckerns do you know if it's possible to hack this, this is not working for me:
>>> _super = super
>>> class obj(object):
... def __init__(self):
... _super(obj, self).__init__()
>>> dill.dumps(obj())
T2: <class '__main__.obj'>
F2: <function _create_type at 0x7f90089d0320>
# F2
T1: <type 'type'>
F2: <function _load_type at 0x7f90089d02a8>
# F2
# T1
T1: <type 'object'>
# T1
D2: <dict object at 0x7f9008aca910>
F1: <function __init__ at 0x7f9008acd938>
F2: <function _create_function at 0x7f90089d0398>
# F2
Co: <code object __init__ at 0x7f9008aa6d30, file "<ipython-input-5-59c28b57355b>", line 3>
T1: <type 'code'>
# T1
# Co
D2: <dict object at 0x7f900896da28>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008a61c58>
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f9008973398>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008a61398>
...
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f9008a14910>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008a07a28>
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f90089184b0>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008a14168>
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f9008919050>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008a0c280>
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f9008919b40>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008918398>
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f900891f6e0>
T2: <class '__main__.obj'>
Traceback (most recent call last):
File "/opt/conda/lib/python2.7/logging/__init__.py", line 885, in emit
self.flush()
File "/opt/conda/lib/python2.7/logging/__init__.py", line 845, in flush
self.stream.flush()
File "/opt/conda/lib/python2.7/site-packages/ipykernel/iostream.py", line 266, in flush
evt = threading.Event()
File "/opt/conda/lib/python2.7/threading.py", line 550, in Event
return _Event(*args, **kwargs)
File "/opt/conda/lib/python2.7/threading.py", line 563, in __init__
self.__cond = Condition(Lock())
File "/opt/conda/lib/python2.7/threading.py", line 253, in Condition
return _Condition(*args, **kwargs)
File "/opt/conda/lib/python2.7/threading.py", line 261, in __init__
_Verbose.__init__(self, verbose)
RuntimeError: maximum recursion depth exceeded while calling a Python object
Logged from file dill.py, line 1198
D2: <dict object at 0x7f9008918168>
Traceback (most recent call last):
File "/opt/conda/lib/python2.7/logging/__init__.py", line 885, in emit
self.flush()
File "/opt/conda/lib/python2.7/logging/__init__.py", line 845, in flush
self.stream.flush()
File "/opt/conda/lib/python2.7/site-packages/ipykernel/iostream.py", line 266, in flush
evt = threading.Event()
File "/opt/conda/lib/python2.7/threading.py", line 550, in Event
return _Event(*args, **kwargs)
File "/opt/conda/lib/python2.7/threading.py", line 562, in __init__
_Verbose.__init__(self, verbose)
RuntimeError: maximum recursion depth exceeded while calling a Python object
I'm using:
2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
When I try it, my output is the same as yours, up to: # T1 # Co
. Next, where yours has a D2
, mine is a D1
. I am testing with 2.7.11
(not 2.7.12
). The D1
means that both the __module__
of the pickler is dill
and that __main__.__dict__
is being pickled; while D2
means that no special conditions are met thus python's pickler is being used. I don't see why if you are doing a dill.dumps
, you should get anything different than I am.
D1: <dict object at 0x1046e1168>
# D1
D2: <dict object at 0x104cc3e88>
# D2
# F1
# D2
# T2
D2: <dict object at 0x104caf398>
# D2
Then it completes the dumps
.
However, in your traceback, I noticed you have the threading
module, and multiple copies of the error. Are you actually calling the dumps
with multiprocessing
, multiprocessing.dummy
, or the threading
library? If so, you might want to try the dill
-aware multiprocess
library.
@mmckerns, for the threads, I was using a jupyter notebook.
That's weird, I tested the same script with several clean virtualenvs and I still get the error.
I'm not really familiar with the internals of super()
and pickle
so I maybe miss something obvious.
Is it because I use the recurse option?
I need it because the class that I want to serialize needs several other modules.
Python 2.7.11 |Anaconda 2.4.1 (64-bit)| (default, Dec 6 2015, 18:08:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
errors:
>>> dill.dumps(obj, protocol=2, recurse=True)
T2: <class '__main__.obj'>
F2: <function _create_type at 0x7fb9730fc320>
# F2
T1: <type 'type'>
F2: <function _load_type at 0x7fb9730fc2a8>
# F2
# T1
T1: <type 'object'>
# T1
D2: <dict object at 0x7fb972ef2d70>
F1: <function __init__ at 0x7fb972ebfe60>
F2: <function _create_function at 0x7fb9730fc398>
# F2
Co: <code object __init__ at 0x7fb976eb1e30, file "<stdin>", line 2>
T1: <type 'code'>
# T1
# Co
D2: <dict object at 0x7fb97312bd70>
T2: <class '__main__.obj'>
D2: <dict object at 0x7fb972e86280>
F1: <function __init__ at 0x7fb972ebfe60>
D2: <dict object at 0x7fb972ee17f8>
T2: <class '__main__.obj'>
D2: <dict object at 0x7fb972ef2e88>
F1: <function __init__ at 0x7fb972ebfe60>
D2: <dict object at 0x7fb972ee1168>
T2: <class '__main__.obj'>
D2: <dict object at 0x7fb976e98a28>
F1: <function __init__ at 0x7fb972ebfe60>
D2: <dict object at 0x7fb972ee14b0>
T2: <class '__main__.obj'>
D2: <dict object at 0x7fb97311d280>
F1: <function __init__ at 0x7fb972ebfe60>
D2: <dict object at 0x7fb972ef2910>
T2: <class '__main__.obj'>
D2: <dict object at 0x7fb972ee15c8>
F1: <function __init__ at 0x7fb972ebfe60>
D2: <dict object at 0x7fb972ef2a28>
T2: <class '__main__.obj'>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 243, in dumps
dump(obj, file, protocol, byref, fmode, recurse)#, strictio)
File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 236, in dump
pik.dump(obj)
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 1216, in save_type
obj.__bases__, _dict), obj=obj)
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 401, in save_reduce
save(args)
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 568, in save_tuple
save(element)
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 835, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 687, in _batch_setitems
save(v)
...
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 1216, in save_type
obj.__bases__, _dict), obj=obj)
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 401, in save_reduce
save(args)
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 568, in save_tuple
save(element)
File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 831, in save_module_dict
log.info("D2: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
File "/home/tboquet/anaconda2/lib/python2.7/logging/__init__.py", line 1159, in info
self._log(INFO, msg, args, **kwargs)
File "/home/tboquet/anaconda2/lib/python2.7/logging/__init__.py", line 1277, in _log
record = self.makeRecord(self.name, level, fn, lno, msg, args, exc_info, func, extra)
File "/home/tboquet/anaconda2/lib/python2.7/logging/__init__.py", line 1251, in makeRecord
rv = LogRecord(name, level, fn, lno, msg, args, exc_info, func)
RuntimeError: maximum recursion depth exceeded
Python 3.4.5 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:47:47)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
errors:
>>> dill.dumps(obj, protocol=2, recurse=True)
T2: <class '__main__.obj'>
F2: <function _create_type at 0x7f23ff99c1e0>
# F2
T1: <class 'type'>
F2: <function _load_type at 0x7f23ff99c158>
# F2
# T1
T1: <class 'object'>
# T1
D2: <dict object at 0x7f2401da9408>
F1: <function obj.__init__ at 0x7f23feb36d90>
F2: <function _create_function at 0x7f23ff99c268>
# F2
Co: <code object __init__ at 0x7f2401d7fdb0, file "<stdin>", line 2>
T1: <class 'code'>
# T1
B3: <built-in function encode>
F2: <function _get_attr at 0x7f23ff99cae8>
# F2
M2: <module '_codecs' (built-in)>
F2: <function _import_module at 0x7f23ff99cbf8>
# F2
# M2
# B3
# Co
D2: <dict object at 0x7f23fecc50c8>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f2401dd5a88>
F1: <function obj.__init__ at 0x7f23feb36d90>
D2: <dict object at 0x7f23fecc5088>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f23ff999b08>
F1: <function obj.__init__ at 0x7f23feb36d90>
D2: <dict object at 0x7f23fecc5288>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f23fecc5188>
F1: <function obj.__init__ at 0x7f23feb36d90>
D2: <dict object at 0x7f23feb28748>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f23feb28708>
F1: <function obj.__init__ at 0x7f23feb36d90>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 243, in dumps
dump(obj, file, protocol, byref, fmode, recurse)#, strictio)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 236, in dump
pik.dump(obj)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 412, in dump
self.save(obj)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 1216, in save_type
obj.__bases__, _dict), obj=obj)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 603, in save_reduce
save(args)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 744, in save_tuple
save(element)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 835, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 814, in save_dict
self._batch_setitems(obj.items())
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 840, in _batch_setitems
save(v)
...
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 793, in save_function
obj.__dict__), obj=obj)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 603, in save_reduce
save(args)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 744, in save_tuple
save(element)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 831, in save_module_dict
log.info("D2: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/logging/__init__.py", line 1279, in info
self._log(INFO, msg, args, **kwargs)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/logging/__init__.py", line 1413, in _log
exc_info, func, extra, sinfo)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/logging/__init__.py", line 1385, in makeRecord
sinfo)
File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/logging/__init__.py", line 273, in __init__
self.levelname = getLevelName(level)
RuntimeError: maximum recursion depth exceeded
Ok, so it's working without the recurse option. I guess I'm stuck in a dead end :disappointed: .
@tboquet: I get the same error when I use recurse=True
, and yes, it's due to using this option (as recurse
does not treat __main__.__dict__
as a special dict
). It failing for recurse=True
should probably be considered a dill
bug.
@mmckerns I'm trying to bypass the serialization of all of the dependencies. I reload all the modules and functions in globals()
with import_module
but they are not found by the methods where they are involved in a new process. I guess it's because it is supposed to be defined in the same file where the object comes from. Is it the right way to do it? Is it at least possible or do you see something wrong in this idea?
@tboquet: I'm not sure I understand you, but if you don't want the dependencies serialized, then you don't want recurse=True
-- the whole point there is to serialize the dependencies. If you have the dependencies in another file versus the same file, they also will serialize differently. It's actually the easiest if the dependencies are in another installed module -- if that's the case, they will be pickled and unpickled by reference. If the dependency modules are not installed (i.e. just other scripts in the same directory), then that doesn't work well (see issue #176, #123, etc)… and it's better to put the dependencies in the same file.
@mmckerns sorry I wasn't clear in my last comment. I tried to serialize the class with recurse=False
. Then I tried to reinstanciate it on another virtual machine having the same packages. Because I need the depencies I tried to load every module by hand using import_module
directly into globals()
so they are accessible. Unfortunately these modules and fonctions are not found in this new process but are loaded in globals()
. I was wondering if I had to change the file location of the serialized method which is by default the cell of my interactive jupyter session.
@tboquet: This feels like it should be it's own ticket. So, I'm going to consider up to about three posts back to be in the thread for this issue. However, with "I'm trying to bypass the serialization of all the dependencies…", it sounds like you are starting to get into a different issue, and I need to see the code for what you are doing. Can you open a new ticket and reference this one? Also, please see some of the existing issues about serializing functions and their dependencies. Again, that is the primary reason the recurse
variant exists.
@mmckerns sure, I'll open a clean issue with a snippet of code to reproduce what I'm trying to do.
Any progress? This bug is really preventing us from using dill.
We established that the source of the issue was some new behavior from pickling super
itself. I've pinpointed it. The issue is that pickling a function that contains super
now produces a __closure__
that has a cell
object which has a pointer to an instance of the class. It's recursive because the instance of the class that is produces is a new instance, unfortunately... and I think python made that decision because pickle
serializes classes by reference, and that should break the recursion.
>>> o = obj()
>>> o.__init__.__func__.__code__.co_names
('super', 'obj', '__init__')
>>> o
<__main__.obj object at 0x10558bd68>
>>> # 3.6
>>> # dill.dumps(o.__init__.__func__, byref=True) # WORKS
>>> # dill.dumps(o.__init__.__func__) # RecursionError
>>> o.__init__.__func__.__closure__[0].cell_contents()
<__main__.obj object at 0x1055bd400>
>>> # 2.7
>>> o.__init__.__func__.__closure__
>>>
Therefore, I think a reasonable, but not perfect, solution within dill
is to detect when co_names
includes super
, and then temporarily change the _byref
flag on the pickler to _byref = True
.
With this edit to dill.dill.save_function
:
if 'super' in obj.__code__.co_names:
_byref = getattr(pickler, '_byref', None)
if _byref is not None:
pickler._byref = True
pickler.save_reduce(_create_function, (obj.__code__,
globs, obj.__name__,
obj.__defaults__, obj.__closure__,
obj.__dict__), obj=obj)
if 'super' in obj.__code__.co_names and _byref is not None:
pickler._byref = _byref
Then this succeeds:
import dill
class obj(object):
def __init__(self):
super(obj, self).__init__()
repr(dill.dumps(obj(), byref=True))
repr(dill.dumps(obj(), recurse=True))
repr(dill.dumps(obj()))
Note that this particularly unusual case will also work, except when recurse=True
.
import dill
class obj(object):
_super = super
def __init__(self):
obj._super(obj, self).__init__()
This kind of unusual case fails for both 2.x
and 3.x
, so it needs a bit more investigation before I can say if it's the same issue or not. The workaround for super
I tried above is obviously defeated by the above. There's probably a better fix. Ultimately, what's needed is once super
is detected in the function, the "appropriate" objects that are produced by the resulting closure should be serialized with byref=True
. There's a possibility that it will cause some new failures... but I think they are really really unlikely. The "new" niche behavior would only be triggered for pickling a function that contains super
(so it should be inside a class)... and unless another closure is being used in the method (aside from the one that super
oddly creates), then flipping on byref
while picking the function should be fine. It'd probably be better to turn it on for the objects produced from the cell
... but maybe that's harder to do as it'd need some handshake, I think. I haven't tried it yet.
Note that pickle
does the following, to avoid recursion:
if id(obj) in self.memo:
# If the object is already in the memo, this means it is
# recursive. In this case, throw away everything we put on the
# stack, and fetch the object back from the memo.
write(POP_MARK + self.get(self.memo[id(obj)][0]))
return
Maybe something like this is a better solution... as it's pretty hard to get anything from a cell except what the cell_contents
are.
In dill
, there is already the dill.dill.stack
, where: stack = set() # record of 'recursion-sensitive' pickled objects
. It's not really used.
Addressed this issue as noted above in 2f1395d07c8378cb77f374098504684ae77189ce. I'm closing this issue. Please add any comments here, or reopen if there are any issues.
Arg. Oddly this breaks 2.6
, 3.3
, and pypy
when run from tox
. Oddly, pypy
fails on a missing attribute that is present, and works when not run with tox
. So, weird.
Patched in 13f82f36e6bd6575af41223c8919901ecb260aeb by clearing the memo.
@matsjoyce: maybe you can take a look at this? I don't think it breaks anything, but it feels kind of hacky. It doesn't seem too too terrible, and I imagine I could find corner cases that it mishandles. Maybe you can see some improvements. The code patch addresses both super and blocking a good many of the RecursionErrors.
I don't like this solution, but I think it's the only way of doing it short of rewriting pickle. The real problem is that pickle serializes the object before memorising it, so recursive objects always require hacks. If it checked the memo, then added it to the memo, then serialized it this problem would not happen. So yeah, I can't think of anything better.
@matsjoyce: I don't know if you could tell from my note, but yeah I don't like the solution either. If you do think of something better, please feel free to comment, PR, or otherwise share. Thanks.
Has there been any progress with this, or is there a workaround available?
Edit:
@mmckerns Not sure if this will help you but I have tried in various ways to modify a model which used sub-classing but the same error occurred. The same class instantiated and used inside a function worked.
@mirceamironenco: Nothing new has been done here. The hack I put in place is still in place, and should work for most cases with regard to this ticket. If you are seeing something that's causing errors, please submit an issue describing what you see.
If you are seeing something that's causing errors, please submit an issue describing what you see.
https://github.com/uqfoundation/dill/issues/300 is one of the use-cases that current fix does not cover.
BTW, here's how Cloudpickle addresses recursion problem:
The following code fails for me using Python 3.4. (works under 2.7, and also with standard pickle module under 3.4)