dump fails with RuntimeError (recursion depth) for class with super

zentol commented 9 years ago

The following code fails for me using Python 3.4. (works under 2.7, and also with standard pickle module under 3.4)

import dill

class obj(object):
    def __init__(self):
        pass

class obj2(obj):
    def __init__(self):
        super(obj2, self).__init__()

dill.dumps(obj2())

mmckerns commented 9 years ago

Yes, looks like a bug. Thanks for reporting.

This seems to happen for python3.x, not just 3.4… and apparently happens due to the use of super (I'll need to investigate a bit further). If you replace the super line with a pass, then obj2() pickles fine with dill.

Since pickle serializes by reference, you can still access this behavior with dill. So the workaround (until the bug is fixed) would be to have dill serialize by reference.

>>> dill.dumps(obj2(), byref=True)
b'\x80\x03c__main__\nobj2\nq\x00)\x81q\x01.'

matsjoyce commented 9 years ago

Here's a slightly pruned down version that still fails:

import dill
class obj:
    def __init__(self):
        super()

dill.dumps(obj())

Seems to be getting into a cycle of:

T2: <class '__main__.obj'>
D2: <dict object at 0x7f2862e14688>
F1: <function obj.__init__ at 0x7f2862e09bf8>
D1: <dict object at 0x7f2867d85388>
Ce: <cell at 0x7f2867d7c528: type object at 0x7f286a69eb78>

mmckerns commented 9 years ago

>>> dill.detect.trace(True)
>>> 
>>> dill.pickles(obj2())
T2: <class '__main__.obj2'>
F2: <function _create_type at 0x10e471560>
T1: <class 'type'>
F2: <function _load_type at 0x10e471050>
T2: <class '__main__.obj'>
T1: <class 'object'>
D2: <dict object at 0x10e5a3758>
F1: <function obj.__init__ at 0x10da08680>
F2: <function _create_function at 0x10d68ec20>
Co: <code object __init__ at 0x10d626150, file "<stdin>", line 2>
F2: <function _unmarshal at 0x10e4715f0>
D1: <dict object at 0x10d6546c8>
D2: <dict object at 0x10da123f8>
D2: <dict object at 0x10e5a3098>
F1: <function obj2.__init__ at 0x10da087a0>
Co: <code object __init__ at 0x10d902660, file "<stdin>", line 2>
D1: <dict object at 0x10d6546c8>
Ce: <cell at 0x10d8c3b08: type object at 0x7f9949c33dd0>
F2: <function _create_cell at 0x10e4817a0>
T2: <class '__main__.obj2'>
D2: <dict object at 0x10e5a2560>
F1: <function obj2.__init__ at 0x10da087a0>
D1: <dict object at 0x10d6546c8>
Ce: <cell at 0x10d8c3b08: type object at 0x7f9949c33dd0>
T2: <class '__main__.obj2'>
D2: <dict object at 0x10e5a37e8>
F1: <function obj2.__init__ at 0x10da087a0>
D1: <dict object at 0x10d6546c8>
Ce: <cell at 0x10d8c3b08: type object at 0x7f9949c33dd0>
T2: <class '__main__.obj2'>

With the last block repeating until it hits the recursion error for python3.x. In python2.x, we have:

>>> dill.pickles(obj2)
T2: <class '__main__.obj2'>
F2: <function _create_type at 0x102618398>
T1: <type 'type'>
F2: <function _load_type at 0x102618320>
T2: <class '__main__.obj'>
T1: <type 'object'>
D2: <dict object at 0x10264c398>
F1: <function __init__ at 0x102677a28>
F2: <function _create_function at 0x102618410>
Co: <code object __init__ at 0x101a2e830, file "<stdin>", line 2>
F2: <function _unmarshal at 0x1026182a8>
D1: <dict object at 0x101774168>
D2: <dict object at 0x102645a28>
D2: <dict object at 0x10264c050>
F1: <function __init__ at 0x102677aa0>
Co: <code object __init__ at 0x1018520b0, file "<stdin>", line 2>
D1: <dict object at 0x101774168>
D2: <dict object at 0x1018115c8>
True

So looks like (in python3) we have:

Co: <code object __init__ at 0x10d902660, file "<stdin>", line 2>
D1: <dict object at 0x10d6546c8>
Ce: <cell at 0x10d8c3b08: type object at 0x7f9949c33dd0>

instead of:

Co: <code object __init__ at 0x1018520b0, file "<stdin>", line 2>
D1: <dict object at 0x101774168>
D2: <dict object at 0x1018115c8>

mmckerns commented 9 years ago

Looks like this is it:

        log.info("D1: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
        if PY3:
            pickler.write(bytes('c__builtin__\n__main__\n', 'UTF-8'))

Not __builtin__ in PY3.

matsjoyce commented 9 years ago

Wouldn't that just cause unpickling to fail?

mmckerns commented 9 years ago

No, it's editing the pickled string there, if I remember correctly. This could cause issues elsewhere too, where only the old '__builtin__' is looked for --

  def find_class(self, module, name):
        if (module, name) == ('__builtin__', '__main__'):

Of, course, this needs investigation.

mmckerns commented 9 years ago

Python2/3 compatible minimal test.

>>> class obj(object):
...     def __init__(self):
...         super(obj, self).__init__()
... 
>>> dill.dumps(obj())

mmckerns commented 9 years ago

>>> class obj(object):
...   def __init__(self):
...     object.__init__(self)
... 
>>> dill.dumps(obj())
b'\x80\x03cdill.dill\n_create_type\nq\x00(cdill.dill\n_load_type\nq\x01X\x04\x00\x00\x00typeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00objq\x05h\x01X\x06\x00\x00\x00objectq\x06\x85q\x07Rq\x08\x85q\t}q\n(X\r\x00\x00\x00__slotnames__q\x0b]q\x0cX\x08\x00\x00\x00__init__q\rcdill.dill\n_create_function\nq\x0e(cdill.dill\n_unmarshal\nq\x0fC\x8ac\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x11\x00\x00\x00t\x00\x00j\x01\x00|\x00\x00\x83\x01\x00\x01d\x00\x00S(\x01\x00\x00\x00N(\x02\x00\x00\x00u\x06\x00\x00\x00objectu\x08\x00\x00\x00__init__(\x01\x00\x00\x00u\x04\x00\x00\x00self(\x00\x00\x00\x00(\x00\x00\x00\x00u\x07\x00\x00\x00<stdin>u\x08\x00\x00\x00__init__\x02\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x10\x85q\x11Rq\x12c__builtin__\n__main__\nh\rNN}q\x13tq\x14Rq\x15X\x07\x00\x00\x00__doc__q\x16NX\n\x00\x00\x00__module__q\x17X\x08\x00\x00\x00__main__q\x18utq\x19Rq\x1a)\x81q\x1b.'

mmckerns commented 9 years ago

The trick I'm using there is declaring __builtin__ as the module in the pickle, and then this:

   def find_class(self, module, name):
        if (module, name) == ('__builtin__', '__main__'):
            return self._main_module.__dict__

Gives us D2. So that's not getting triggered in python3.

mmckerns commented 9 years ago

My comment further above may not be correct… (e.g. not __builtin__).

'__builtin__.__main__' is used as a class name in find_class that should indicate we are pickling from the interpreter.

However, it might require 'builtins' to be used so the module can be imported, even though it's not really used.

matsjoyce commented 9 years ago

The cycle is obj.__init__.__closure__[0].cell_contents == obj, but I'm not sure how its supposed to be broken.

mmckerns commented 9 years ago

Ack. Looks like it doesn't make a difference to use builtins. Whiff.

mmckerns commented 9 years ago

Shouldn't need to "break" the cycle. Should avoid it ever getting to a CellType, I think. That means why finding why it doesn't it go: Co, D1, D2

matsjoyce commented 9 years ago

If you just have:

class obj:
    def __init__(self): pass

obj.__init__.__closure__ is None

mmckerns commented 9 years ago

And the D1 and D2 bit are there to work with globals() in __main__, so maybe super is a weird object in the lookup… and we need a special case for it… basically, don't try to look it up in globals?

>>> super.mro()
[<class 'super'>, <class 'object'>]
>>> super(object)
<super: <class 'object'>, NULL>

mmckerns commented 9 years ago

This also fails with the same error:

>>> class obj:
...     def __init__(self):
...         id(super), id(self), id(obj)
... 
>>> dill.dumps(obj())

While this succeeds:

>>> class obj:
...     def __init__(self):
...         id(self), id(obj)
...
>>> dill.dumps(obj())

And this fails:

>>> class obj:
...     def __init__(self):
...         super
... 
>>> dill.dumps(obj())

So, apparently it should the lookup of super in the global dict that could be fixed.

matsjoyce commented 9 years ago

It must have something to do with the new style super, which according to https://www.python.org/dev/peps/pep-0367/#reference-implementation, uses bytecode hacking.

mmckerns commented 9 years ago

This works:

>>> _super = super
>>> class obj(object):
...   def __init__(self):
...     _super(obj, self).__init__()
... 
>>> dill.dumps(obj())

However, this fails with an interesting error:

>>> _super = super
>>> class obj(object):
...   def __init__(self):
...     _super()
... 
>>> o = obj()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __init__
RuntimeError: super(): __class__ cell not found

This too:

>>> class obj:
...   _super = super
...   def __init__(self):
...     self._super(object, self).__init__()
... 
>>> o = obj()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __init__
RuntimeError: super(): __class__ cell not found

This too:

>>> class obj:
...   _super = super
...   def __init__(self):
...     obj._super(object, self).__init__()
... 
>>> o = obj()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __init__
RuntimeError: super(): __class__ cell not found

To me this is smelling like a python bug.

So a workaround would be to "find" super instead of go looking for it… otherwise, we'll have to understand what was done in the hack.

frnsys commented 8 years ago

has a bug for this been opened with python?

mmckerns commented 8 years ago

@frnsys: No. Not that I know of, but I also didn't check too extensively either.

tboquet commented 8 years ago

@mmckerns do you know if it's possible to hack this, this is not working for me:

>>> _super = super
>>> class obj(object):
...   def __init__(self):
...     _super(obj, self).__init__()
>>> dill.dumps(obj())

T2: <class '__main__.obj'>
F2: <function _create_type at 0x7f90089d0320>
# F2
T1: <type 'type'>
F2: <function _load_type at 0x7f90089d02a8>
# F2
# T1
T1: <type 'object'>
# T1
D2: <dict object at 0x7f9008aca910>
F1: <function __init__ at 0x7f9008acd938>
F2: <function _create_function at 0x7f90089d0398>
# F2
Co: <code object __init__ at 0x7f9008aa6d30, file "<ipython-input-5-59c28b57355b>", line 3>
T1: <type 'code'>
# T1
# Co
D2: <dict object at 0x7f900896da28>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008a61c58>
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f9008973398>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008a61398>
...
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f9008a14910>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008a07a28>
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f90089184b0>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008a14168>
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f9008919050>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008a0c280>
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f9008919b40>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f9008918398>
F1: <function __init__ at 0x7f9008acd938>
D2: <dict object at 0x7f900891f6e0>
T2: <class '__main__.obj'>
Traceback (most recent call last):
  File "/opt/conda/lib/python2.7/logging/__init__.py", line 885, in emit
    self.flush()
  File "/opt/conda/lib/python2.7/logging/__init__.py", line 845, in flush
    self.stream.flush()
  File "/opt/conda/lib/python2.7/site-packages/ipykernel/iostream.py", line 266, in flush
    evt = threading.Event()
  File "/opt/conda/lib/python2.7/threading.py", line 550, in Event
    return _Event(*args, **kwargs)
  File "/opt/conda/lib/python2.7/threading.py", line 563, in __init__
    self.__cond = Condition(Lock())
  File "/opt/conda/lib/python2.7/threading.py", line 253, in Condition
    return _Condition(*args, **kwargs)
  File "/opt/conda/lib/python2.7/threading.py", line 261, in __init__
    _Verbose.__init__(self, verbose)
RuntimeError: maximum recursion depth exceeded while calling a Python object
Logged from file dill.py, line 1198
D2: <dict object at 0x7f9008918168>
Traceback (most recent call last):
  File "/opt/conda/lib/python2.7/logging/__init__.py", line 885, in emit
    self.flush()
  File "/opt/conda/lib/python2.7/logging/__init__.py", line 845, in flush
    self.stream.flush()
  File "/opt/conda/lib/python2.7/site-packages/ipykernel/iostream.py", line 266, in flush
    evt = threading.Event()
  File "/opt/conda/lib/python2.7/threading.py", line 550, in Event
    return _Event(*args, **kwargs)
  File "/opt/conda/lib/python2.7/threading.py", line 562, in __init__
    _Verbose.__init__(self, verbose)
RuntimeError: maximum recursion depth exceeded while calling a Python object

I'm using:

2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

mmckerns commented 8 years ago

When I try it, my output is the same as yours, up to: # T1 # Co. Next, where yours has a D2, mine is a D1. I am testing with 2.7.11 (not 2.7.12). The D1 means that both the __module__ of the pickler is dill and that __main__.__dict__ is being pickled; while D2 means that no special conditions are met thus python's pickler is being used. I don't see why if you are doing a dill.dumps, you should get anything different than I am.

D1: <dict object at 0x1046e1168>
# D1
D2: <dict object at 0x104cc3e88>
# D2
# F1
# D2
# T2
D2: <dict object at 0x104caf398>
# D2

Then it completes the dumps.

However, in your traceback, I noticed you have the threading module, and multiple copies of the error. Are you actually calling the dumps with multiprocessing, multiprocessing.dummy, or the threading library? If so, you might want to try the dill-aware multiprocess library.

tboquet commented 8 years ago

@mmckerns, for the threads, I was using a jupyter notebook. That's weird, I tested the same script with several clean virtualenvs and I still get the error. I'm not really familiar with the internals of super() and pickle so I maybe miss something obvious. Is it because I use the recurse option? I need it because the class that I want to serialize needs several other modules.

Python 2.7.11 |Anaconda 2.4.1 (64-bit)| (default, Dec  6 2015, 18:08:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2

Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2

errors:

>>> dill.dumps(obj, protocol=2, recurse=True)
T2: <class '__main__.obj'>
F2: <function _create_type at 0x7fb9730fc320>
# F2
T1: <type 'type'>
F2: <function _load_type at 0x7fb9730fc2a8>
# F2
# T1
T1: <type 'object'>
# T1
D2: <dict object at 0x7fb972ef2d70>
F1: <function __init__ at 0x7fb972ebfe60>
F2: <function _create_function at 0x7fb9730fc398>
# F2
Co: <code object __init__ at 0x7fb976eb1e30, file "<stdin>", line 2>
T1: <type 'code'>
# T1
# Co
D2: <dict object at 0x7fb97312bd70>
T2: <class '__main__.obj'>
D2: <dict object at 0x7fb972e86280>
F1: <function __init__ at 0x7fb972ebfe60>
D2: <dict object at 0x7fb972ee17f8>
T2: <class '__main__.obj'>
D2: <dict object at 0x7fb972ef2e88>
F1: <function __init__ at 0x7fb972ebfe60>
D2: <dict object at 0x7fb972ee1168>
T2: <class '__main__.obj'>
D2: <dict object at 0x7fb976e98a28>
F1: <function __init__ at 0x7fb972ebfe60>
D2: <dict object at 0x7fb972ee14b0>
T2: <class '__main__.obj'>
D2: <dict object at 0x7fb97311d280>
F1: <function __init__ at 0x7fb972ebfe60>
D2: <dict object at 0x7fb972ef2910>
T2: <class '__main__.obj'>
D2: <dict object at 0x7fb972ee15c8>
F1: <function __init__ at 0x7fb972ebfe60>
D2: <dict object at 0x7fb972ef2a28>
T2: <class '__main__.obj'>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 243, in dumps
    dump(obj, file, protocol, byref, fmode, recurse)#, strictio)
  File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 236, in dump
    pik.dump(obj)
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 1216, in save_type
    obj.__bases__, _dict), obj=obj)
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 401, in save_reduce
    save(args)
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 568, in save_tuple
    save(element)
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 835, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 687, in _batch_setitems
    save(v)
...
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 1216, in save_type
    obj.__bases__, _dict), obj=obj)
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 401, in save_reduce
    save(args)
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 568, in save_tuple
    save(element)
  File "/home/tboquet/anaconda2/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/lib/python2.7/site-packages/dill/dill.py", line 831, in save_module_dict
    log.info("D2: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
  File "/home/tboquet/anaconda2/lib/python2.7/logging/__init__.py", line 1159, in info
    self._log(INFO, msg, args, **kwargs)
  File "/home/tboquet/anaconda2/lib/python2.7/logging/__init__.py", line 1277, in _log
    record = self.makeRecord(self.name, level, fn, lno, msg, args, exc_info, func, extra)
  File "/home/tboquet/anaconda2/lib/python2.7/logging/__init__.py", line 1251, in makeRecord
    rv = LogRecord(name, level, fn, lno, msg, args, exc_info, func)
RuntimeError: maximum recursion depth exceeded

Python 3.4.5 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:47:47)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux

Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux

errors:

>>> dill.dumps(obj, protocol=2, recurse=True)
T2: <class '__main__.obj'>
F2: <function _create_type at 0x7f23ff99c1e0>
# F2
T1: <class 'type'>
F2: <function _load_type at 0x7f23ff99c158>
# F2
# T1
T1: <class 'object'>
# T1
D2: <dict object at 0x7f2401da9408>
F1: <function obj.__init__ at 0x7f23feb36d90>
F2: <function _create_function at 0x7f23ff99c268>
# F2
Co: <code object __init__ at 0x7f2401d7fdb0, file "<stdin>", line 2>
T1: <class 'code'>
# T1
B3: <built-in function encode>
F2: <function _get_attr at 0x7f23ff99cae8>
# F2
M2: <module '_codecs' (built-in)>
F2: <function _import_module at 0x7f23ff99cbf8>
# F2
# M2
# B3
# Co
D2: <dict object at 0x7f23fecc50c8>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f2401dd5a88>
F1: <function obj.__init__ at 0x7f23feb36d90>
D2: <dict object at 0x7f23fecc5088>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f23ff999b08>
F1: <function obj.__init__ at 0x7f23feb36d90>
D2: <dict object at 0x7f23fecc5288>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f23fecc5188>
F1: <function obj.__init__ at 0x7f23feb36d90>
D2: <dict object at 0x7f23feb28748>
T2: <class '__main__.obj'>
D2: <dict object at 0x7f23feb28708>
F1: <function obj.__init__ at 0x7f23feb36d90>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 243, in dumps
    dump(obj, file, protocol, byref, fmode, recurse)#, strictio)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 236, in dump
    pik.dump(obj)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 412, in dump
    self.save(obj)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 1216, in save_type
    obj.__bases__, _dict), obj=obj)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 603, in save_reduce
    save(args)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 744, in save_tuple
    save(element)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 835, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 814, in save_dict
    self._batch_setitems(obj.items())
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 840, in _batch_setitems
    save(v)
...
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 793, in save_function
    obj.__dict__), obj=obj)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 603, in save_reduce
    save(args)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 744, in save_tuple
    save(element)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/pickle.py", line 479, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/site-packages/dill/dill.py", line 831, in save_module_dict
    log.info("D2: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/logging/__init__.py", line 1279, in info
    self._log(INFO, msg, args, **kwargs)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/logging/__init__.py", line 1413, in _log
    exc_info, func, extra, sinfo)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/logging/__init__.py", line 1385, in makeRecord
    sinfo)
  File "/home/tboquet/anaconda2/envs/py34/lib/python3.4/logging/__init__.py", line 273, in __init__
    self.levelname = getLevelName(level)
RuntimeError: maximum recursion depth exceeded

tboquet commented 8 years ago

Ok, so it's working without the recurse option. I guess I'm stuck in a dead end :disappointed: .

mmckerns commented 8 years ago

@tboquet: I get the same error when I use recurse=True, and yes, it's due to using this option (as recurse does not treat __main__.__dict__ as a special dict). It failing for recurse=True should probably be considered a dill bug.

tboquet commented 8 years ago

@mmckerns I'm trying to bypass the serialization of all of the dependencies. I reload all the modules and functions in globals() with import_module but they are not found by the methods where they are involved in a new process. I guess it's because it is supposed to be defined in the same file where the object comes from. Is it the right way to do it? Is it at least possible or do you see something wrong in this idea?

mmckerns commented 8 years ago

@tboquet: I'm not sure I understand you, but if you don't want the dependencies serialized, then you don't want recurse=True -- the whole point there is to serialize the dependencies. If you have the dependencies in another file versus the same file, they also will serialize differently. It's actually the easiest if the dependencies are in another installed module -- if that's the case, they will be pickled and unpickled by reference. If the dependency modules are not installed (i.e. just other scripts in the same directory), then that doesn't work well (see issue #176, #123, etc)… and it's better to put the dependencies in the same file.

tboquet commented 8 years ago

@mmckerns sorry I wasn't clear in my last comment. I tried to serialize the class with recurse=False. Then I tried to reinstanciate it on another virtual machine having the same packages. Because I need the depencies I tried to load every module by hand using import_module directly into globals() so they are accessible. Unfortunately these modules and fonctions are not found in this new process but are loaded in globals(). I was wondering if I had to change the file location of the serialized method which is by default the cell of my interactive jupyter session.

mmckerns commented 8 years ago

@tboquet: This feels like it should be it's own ticket. So, I'm going to consider up to about three posts back to be in the thread for this issue. However, with "I'm trying to bypass the serialization of all the dependencies…", it sounds like you are starting to get into a different issue, and I need to see the code for what you are doing. Can you open a new ticket and reference this one? Also, please see some of the existing issues about serializing functions and their dependencies. Again, that is the primary reason the recurse variant exists.

tboquet commented 8 years ago

@mmckerns sure, I'll open a clean issue with a snippet of code to reproduce what I'm trying to do.

rserbitar commented 7 years ago

Any progress? This bug is really preventing us from using dill.

mmckerns commented 7 years ago

We established that the source of the issue was some new behavior from pickling super itself. I've pinpointed it. The issue is that pickling a function that contains super now produces a __closure__ that has a cell object which has a pointer to an instance of the class. It's recursive because the instance of the class that is produces is a new instance, unfortunately... and I think python made that decision because pickle serializes classes by reference, and that should break the recursion.

>>> o = obj()
>>> o.__init__.__func__.__code__.co_names
('super', 'obj', '__init__')
>>> o
<__main__.obj object at 0x10558bd68>
>>> # 3.6
>>> # dill.dumps(o.__init__.__func__, byref=True)  # WORKS
>>> # dill.dumps(o.__init__.__func__)  # RecursionError
>>> o.__init__.__func__.__closure__[0].cell_contents()
<__main__.obj object at 0x1055bd400>
>>> # 2.7
>>> o.__init__.__func__.__closure__
>>>

Therefore, I think a reasonable, but not perfect, solution within dill is to detect when co_names includes super, and then temporarily change the _byref flag on the pickler to _byref = True.

With this edit to dill.dill.save_function:

            if 'super' in obj.__code__.co_names:
                _byref = getattr(pickler, '_byref', None)
                if _byref is not None:
                    pickler._byref = True
            pickler.save_reduce(_create_function, (obj.__code__,
                                globs, obj.__name__,
                                obj.__defaults__, obj.__closure__,
                                obj.__dict__), obj=obj)
            if 'super' in obj.__code__.co_names and _byref is not None:
                pickler._byref = _byref

Then this succeeds:

import dill

class obj(object):
    def __init__(self):
        super(obj, self).__init__()

repr(dill.dumps(obj(), byref=True))
repr(dill.dumps(obj(), recurse=True))
repr(dill.dumps(obj()))

Note that this particularly unusual case will also work, except when recurse=True.

import dill

class obj(object):
    _super = super
    def __init__(self):
        obj._super(obj, self).__init__()

This kind of unusual case fails for both 2.x and 3.x, so it needs a bit more investigation before I can say if it's the same issue or not. The workaround for super I tried above is obviously defeated by the above. There's probably a better fix. Ultimately, what's needed is once super is detected in the function, the "appropriate" objects that are produced by the resulting closure should be serialized with byref=True. There's a possibility that it will cause some new failures... but I think they are really really unlikely. The "new" niche behavior would only be triggered for pickling a function that contains super (so it should be inside a class)... and unless another closure is being used in the method (aside from the one that super oddly creates), then flipping on byref while picking the function should be fine. It'd probably be better to turn it on for the objects produced from the cell... but maybe that's harder to do as it'd need some handshake, I think. I haven't tried it yet.

mmckerns commented 7 years ago

Note that pickle does the following, to avoid recursion:

        if id(obj) in self.memo:
            # If the object is already in the memo, this means it is
            # recursive. In this case, throw away everything we put on the
            # stack, and fetch the object back from the memo.
            write(POP_MARK + self.get(self.memo[id(obj)][0]))
            return

Maybe something like this is a better solution... as it's pretty hard to get anything from a cell except what the cell_contents are.

mmckerns commented 7 years ago

In dill, there is already the dill.dill.stack, where: stack = set() # record of 'recursion-sensitive' pickled objects. It's not really used.

mmckerns commented 7 years ago

Addressed this issue as noted above in 2f1395d07c8378cb77f374098504684ae77189ce. I'm closing this issue. Please add any comments here, or reopen if there are any issues.

mmckerns commented 7 years ago

Arg. Oddly this breaks 2.6, 3.3, and pypy when run from tox. Oddly, pypy fails on a missing attribute that is present, and works when not run with tox. So, weird.

mmckerns commented 7 years ago

Patched in 13f82f36e6bd6575af41223c8919901ecb260aeb by clearing the memo.

mmckerns commented 7 years ago

@matsjoyce: maybe you can take a look at this? I don't think it breaks anything, but it feels kind of hacky. It doesn't seem too too terrible, and I imagine I could find corner cases that it mishandles. Maybe you can see some improvements. The code patch addresses both super and blocking a good many of the RecursionErrors.

matsjoyce commented 7 years ago

I don't like this solution, but I think it's the only way of doing it short of rewriting pickle. The real problem is that pickle serializes the object before memorising it, so recursive objects always require hacks. If it checked the memo, then added it to the memo, then serialized it this problem would not happen. So yeah, I can't think of anything better.

mmckerns commented 7 years ago

@matsjoyce: I don't know if you could tell from my note, but yeah I don't like the solution either. If you do think of something better, please feel free to comment, PR, or otherwise share. Thanks.

mirceamironenco commented 7 years ago

Has there been any progress with this, or is there a workaround available?

Edit:

@mmckerns Not sure if this will help you but I have tried in various ways to modify a model which used sub-classing but the same error occurred. The same class instantiated and used inside a function worked.

mmckerns commented 7 years ago

@mirceamironenco: Nothing new has been done here. The hack I put in place is still in place, and should work for most cases with regard to this ticket. If you are seeing something that's causing errors, please submit an issue describing what you see.

tvalentyn commented 5 years ago

If you are seeing something that's causing errors, please submit an issue describing what you see.

https://github.com/uqfoundation/dill/issues/300 is one of the use-cases that current fix does not cover.

BTW, here's how Cloudpickle addresses recursion problem:

https://github.com/cloudpipe/cloudpickle/blob/5c0a0b199ce5e36dcb5d848615be95577c2d3ee9/cloudpickle/cloudpickle.py#L654

uqfoundation / dill

dump fails with RuntimeError (recursion depth) for class with super #75