python / cpython

The Python programming language
https://www.python.org
Other
62.86k stars 30.11k forks source link

Can't pickle a bound method if `getattr(type(im_self), funcname)` fails #93889

Open johnomotani opened 2 years ago

johnomotani commented 2 years ago

Bug report

MWE:

import pickle
import numpy as np
from scipy.interpolate import RectBivariateSpline

class Foo:
    def __init__(self):
        x = np.linspace(0,1,10)[:,None]
        y = np.linspace(2,3,9)[None,:]
        self.f = RectBivariateSpline(x,y,x*y)
        def f2(self,x,y):
            return self.f(x,y)
        self.f2 = f2.__get__(self)

foo = Foo()
pickled = pickle.dumps(foo)
unpickled = pickle.loads(pickled)

print(unpickled.f2(0.5, 0.6))

Expected output is to print [[1.]]. Actually gives an exception

Traceback (most recent call last):
  File ".../pickle-test.py", line 18, in <module>
    unpickled = pickle.loads(pickled)
AttributeError: 'Foo' object has no attribute 'f2'

This issue follows up from the identification an fixing of the same issue with dill, see https://github.com/uqfoundation/dill/issues/510 https://github.com/uqfoundation/dill/pull/511 where @anivegesana commented

It seems to be a little bit more subtle than that. Because Python 3 implements its own reduce function for methods, I removed it from dill. It turns out that this is not a safe optimization in the case that the func is an inner function. Will be fixed soon.

Your environment

anivegesana commented 2 years ago

To be more explicit, it is possible to create methods for an object at runtime using functions that are not members of the class using the function as a descriptor (manually calling its __get__ method.) These dynamically created methods cannot be reduced by the current (getattr, (self, 'methodName')) because type(self).methodName is not present in this case.

sweeneyde commented 2 years ago

I don't have much expertise in pickling, but for convenience, here's a similar reproducer without scipy/numpy dependencies:

import pickle

def function(x):
    return x ** 2

class Foo:
    def __init__(self):
        self.f = function
        def f2(self, x):
            return self.f(x)
        self.f2 = f2.__get__(self)

foo = Foo()
print(foo.f2(7)) # prints 49
pickle.loads(pickle.dumps(foo))

Looking at it more though, the fact that it is an inner function seems like it might be a red herring since this gives a similar result:

import pickle

def function(self, x):
    return x ** 2

class Foo:
    pass
foo = Foo()
foo.f2 = function.__get__(foo)
pickle.loads(pickle.dumps(foo))
sweeneyde commented 2 years ago

Ah I see, here's an even simpler reproducer

def func(self):
    return "ok"
bound = func.__get__(42)

from pickle import dumps, loads
loads(dumps(bound)) # AttributeError: 'int' object has no attribute 'func'
sweeneyde commented 2 years ago

It looks like the (getattr, (self, 'methodName')) logic has been there since https://github.com/python/cpython/commit/c9dc4a2a8a6dcfe1674685bea4a4af935c0e37ca. @pitrou, any thoughts on this issue?

I don't know if there is a more reliable way to construct bound method objects in more of these edge cases.

anivegesana commented 2 years ago

dill fixes this by pickling methods as (types.MethodType, (obj.__func__, obj.__self__)). This has some downsides because pickles containing methods will be a little longer. In addition, currently the constructor of types.MethodType is technically not in the public API and is not guaranteed to be the same across future Python versions or other Python implementations. It has changed in the Python 2/3 transition and could technically change at any time without notice or documentation. At the same time, that additional flexibility that would be lost by documenting the constructor could be used to change the internals of methods to make it more efficient in the future.

I do not know if this is something that Python is willing to explicitly include in the standard library or leave for third party pickle extensions like dill to handle.

A similar case that I found was in types.MappingProxyType. Mapping proxies are not currently picklable (probably because the point of a mapping proxy is to not allow anyone to access the underlying mapping and the __reduce__ method would leak it), but according to the documentation, the | operator delegates to the underlying mapping, therefore allowing external access to the underlying mapping anyway. It doesn't really make sense to omit the __reduce__ method anymore. What's more, the same applies to the dictionary views because they now have a public attribute to access a mapping proxy that gets you the dictionary.