ionelmc / python-tblib

Serialization library for Exceptions and Tracebacks.
BSD 2-Clause "Simplified" License
165 stars 33 forks source link

Unpickling exceptions can fail with `TypeError` about the number of arguments given in specific scenarios #65

Open kezabelle opened 2 years ago

kezabelle commented 2 years ago

Firstly, thanks so much for this library and the time you've spent on it!

This is a real edge case, and the library I encountered it in (httpretty) doesn't even do this exception in quite the same way now, but I'm flagging this because it's an exotic usage that has required my investigating, and perhaps it's otherwise unknown to you (though tbf, it may be a limitation, and I'm certainly not expecting a 'fix' from you!)

Using:

httpretty==1.0.5
tblib==1.7.0

pickle_exception and it's counter unpickle_exception can become confused about how to restore the exception, if the exception is defined the same way as it was in httpretty at that version, like so:

class HTTPrettyError(Exception):
    pass

class UnmockedError(HTTPrettyError):
    def __init__(self):
        super(UnmockedError, self).__init__(
            'No mocking was registered, and real connections are '
            'not allowed (httpretty.allow_net_connect = False).'
        )

It's important to note that the __init__ doesn't declare any params, but internally passes some up so that they're ultimately set into .args

If we then try and pickle & unpickle that, like so:

In [1]: from httpretty import UnmockedError
In [2]: from tblib import pickling_support
In [3]: x = UnmockedError()
In [4]: repr(x)
Out[4]: "UnmockedError('No mocking was registered, and real connections are not allowed (httpretty.allow_net_connect = False).')"

In [5]: str(x)
Out[5]: 'No mocking was registered, and real connections are not allowed (httpretty.allow_net_connect = False).'

In [6]: func, params = pickling_support.pickle_exception(x)
In [7]: func(*params)
File ~/path/to/site-packages/tblib/pickling_support.py:26, in unpickle_exception(func, args, cause, tb)
     25 def unpickle_exception(func, args, cause, tb):
---> 26     inst = func(*args)
     27     inst.__cause__ = cause
     28     inst.__traceback__ = tb
TypeError: UnmockedError.__init__() takes 1 positional argument but 2 were given
dargueta commented 2 years ago

I run into the same problem if we have a mandatory keyword argument:

>>> import tblib
>>> from tblib import pickling_support
>>> class Error(Exception):
...     def __init__(self, *, kw):
...             super().__init__(f"got arg: {kw!r}")
... 
>>> err = Error(kw="asdfasdf")
>>> func, params = pickling_support.pickle_exception(err)
>>> func(*params)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dargueta/.pyenv/versions/3.9.12/envs/django/lib/python3.9/site-packages/tblib/pickling_support.py", line 26, in unpickle_exception
    inst = func(*args)
TypeError: __init__() takes 1 positional argument but 2 were given

>>> params
(<class '__main__.Error'>, ("got arg: 'asdfasdf'",), None, None)

If you look at what's in params, you can see that only the error message passed to the superclass was retained; the original keyword argument was lost.

If I add in a positional argument the error message changes but the fundamental problem remains the same.

>>> class Error2(Exception):
...     def __init__(self, positional, *, kw):
...         super().__init__(f"{positional}: {kw!r}")
... 
>>> err2 = Error2("foo", kw="bar")
>>> func, params = pickling_support.pickle_exception(err2)
>>> func(*params)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dargueta/.pyenv/versions/3.9.12/envs/django/lib/python3.9/site-packages/tblib/pickling_support.py", line 26, in unpickle_exception
    inst = func(*args)
TypeError: __init__() missing 1 required keyword-only argument: 'kw'

>>> params
(<class '__main__.Error2'>, ("foo: 'bar'",), None, None)

Again, the only argument preserved is the single argument passed to the superconstructor; the arguments I passed to the original exception are gone.

The underlying issue appears to be that the original arguments passed to the constructor of the leaf class are not preserved, only what Exception got is kept.

oldium commented 1 year ago

Same for some standard library exceptions, like urllib3.exceptions.NameResolutionError. It is possible to pickle it, but unpickling fails:

TypeError: NameResolutionError.__init__() missing 2 required positional arguments: 'conn' and 'reason'
oldium commented 1 year ago

And I see that pickle library suffers from the same issue, so this might not be tblib problem at all. Example:

>>> pickle.loads(pickle.dumps(NameResolutionError("host", "conn", "reason")))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: NameResolutionError.__init__() missing 2 required positional arguments: 'conn' and 'reason'
oldium commented 1 year ago

The following change to tblib did the trick for me:

def unpickle_exception(func, args, cause, tb, context=None, suppress_context=False, notes=None):
    inst = func.__new__(func)
    if args is not None:
        inst.args = args
    inst.__cause__ = cause
    inst.__traceback__ = tb
    inst.__context__ = context
    inst.__suppress_context__ = suppress_context
    if notes is not None:
        inst.__notes__ = notes
    return inst

def pickle_exception(obj: BaseException):
    rv = obj.__reduce_ex__(3)
    if isinstance(rv, str):
        raise TypeError('str __reduce__ output is not supported')
    assert isinstance(rv, tuple)
    assert len(rv) >= 2

    return (
        unpickle_exception,
        rv[:1]
        + (
            obj.args,
            obj.__cause__,
            obj.__traceback__,
            obj.__context__,
            obj.__suppress_context__,
            # __notes__ doesn't exist prior to Python 3.11; and even on Python 3.11 it may be absent
            getattr(obj, '__notes__', None),
        ),
    ) + rv[2:]

I also had to add pickling of _thread.lock (of _thread.LockType type), but that is a different story 😊

oldium commented 1 year ago

Pull Request #73 should (hopefully) address the issue, at least it works-for-me™.

ionelmc commented 1 year ago

Same for some standard library exceptions, like urllib3.exceptions.NameResolutionError. It is possible to pickle it, but unpickling fails:

TypeError: NameResolutionError.__init__() missing 2 required positional arguments: 'conn' and 'reason'

You can't really pickle a live connection object anyway. Have you tried registering a custom handler for NameResolutionError in copyreg?

oldium commented 1 year ago

Same for some standard library exceptions, like urllib3.exceptions.NameResolutionError. It is possible to pickle it, but unpickling fails:

TypeError: NameResolutionError.__init__() missing 2 required positional arguments: 'conn' and 'reason'

You can't really pickle a live connection object anyway. Have you tried registering a custom handler for NameResolutionError in copyreg?

That is correct. This is why I registered custom handler for _thread.LockType and NameResolutionError works just fine with it (and also with changes from #73). I do not want to do any action with the connection, I just want a snapshot to be able to (generally) check “post-mortem” what happened.

oldium commented 8 months ago

Just a note, this is how I fixed pickling of locks in the exception instances:

def unpickle_thread_lock():
    return _thread.allocate_lock()

def pickle_thread_lock(obj):
    return unpickle_thread_lock, ()

class MyPickler(pickle.Pickler):
    def reducer_override(self, obj):
        if isinstance(obj, _thread.LockType):
            return pickle_thread_lock(obj)
        elif isinstance(obj, TracebackType):
            return tblib.pickling_support.pickle_traceback(obj)
        elif isinstance(obj, BaseException):
            return tblib.pickling_support.pickle_exception(obj)
        else:
            # For any other object, fallback to usual reduction
            return NotImplemented

Or if you are interested just in locks, you can use global pickling registry:

copyreg.pickle(_thread.LockType, pickle_thread_lock)