ionelmc / python-tblib

Serialization library for Exceptions and Tracebacks.
BSD 2-Clause "Simplified" License
165 stars 33 forks source link

New feature: pickle Exception, its cause, and traceback #53

Closed crusaderky closed 4 years ago

crusaderky commented 5 years ago

Even after enabling tblib, when one pickles an exception, the previous exceptions in the exception chain gets lost. This has been discussed here: https://github.com/dask/distributed/issues/3158

It would be very straightforward to add to tblib an (optional?) call to copyreg.pickle(Exception, ...) which overrides the pickling system of the whole Exception, recursively pickles its __cause__ and __traceback__, and automatically reassembles them upon unpickling.

The behaviour for the user would be extremely straightforward (and would not require six.reraise anymore):

import pickle
from tblib import pickling_support
pickling_support.install()

try:
    try:
        raise Exception("foo")
    except Exception as e:
        raise Exception("bar") from e
except Exception as e:
    buf = pickle.dumps(e)

raise pickle.loads(buf)
# original Exception, its traceback, and its cause (which in turn has traceback and cause, recursively)

I'm happy to work on a PR myself - would you be amenable to merging it?

crusaderky commented 5 years ago

Darn - I just realized that copyreg.pickle doesn't work on subclasses. It is still be possible to recursively traverse the subclass tree of BaseException, but that would require calling the patch function after all other Exceptions of a package and all its dependencies have been declared.

Still better than not having anything though...

Proof of concept:

import copyreg
import pickle

from tblib import pickling_support

def pickle_exception(ex):
    return unpickle_exception, (ex.__reduce__(), ex.__cause__, ex.__traceback__)

def unpickle_exception(reduce_out, cause, tb):
    func, args = reduce_out
    ex = func(args)
    ex.__cause__ = cause
    ex.__traceback__ = tb
    return ex

def get_subclasses(cls):
    """Depth-first recursive traversal of all subclasses of cls
    """
    to_visit = [cls, ]
    while to_visit:
        this = to_visit.pop()
        yield this
        to_visit += list(this.__subclasses__())

def patch_exceptions():
    for exception_cls in get_subclasses(BaseException):
        copyreg.pickle(exception_cls, pickle_exception)

class CustomError(Exception):
    pass

def test():
    try:
        try:
            1 / 0
        except Exception as e:
            raise CustomError("bar") from e
    except Exception as e:
        buf = pickle.dumps(e)

    raise pickle.loads(buf)

pickling_support.install()
patch_exceptions()
test()

Output:

Traceback (most recent call last):
  File "/home/crusaderky/.PyCharmCE2019.2/config/scratches/pickle_traceback.py", line 47, in test
    1 / 0
ZeroDivisionError: ('division by zero',)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/crusaderky/.PyCharmCE2019.2/config/scratches/pickle_traceback.py", line 56, in <module>
    test()
  File "/home/crusaderky/.PyCharmCE2019.2/config/scratches/pickle_traceback.py", line 53, in test
    raise pickle.loads(buf)
  File "/home/crusaderky/.PyCharmCE2019.2/config/scratches/pickle_traceback.py", line 49, in test
    raise CustomError("bar") from e
__main__.CustomError: ('bar',)

Process finished with exit code 1
crusaderky commented 5 years ago

It is also possible (alternatively or in addition to the above) to have a function that is applied to an exception instance just before it's sent for pickling instead:


def ensure_full_exception_pickle(ex: Exception):
    while ex is not None:
        copyreg.pickle(ex.__class__, pickle_exception)
        ex = ex.__cause__
ionelmc commented 5 years ago

So I'm generally in favor of having support for __cause__ but the PR needs to be portable (perhaps only register the exeption copyreg handlers on py3?) and add support for it in the to_dict/from_dict api as well (to be consistent).

ionelmc commented 5 years ago

The behaviour for the user would be extremely straightforward (and would not require six.reraise anymore)

Can you explain the six.reraise thing a bit more or give an example?

crusaderky commented 5 years ago

It's in your README.md:

>>> from six import reraise
>>> reraise(*pickle.loads(s1))

add support for it in the to_dict/from_dict api as well (to be consistent).

As a concept it makes sense, but how do you think the API should look like? I don't think it makes sense to have a class here. Maybe two top-level functions to_dict / from_dict that can be fed an Exception or a traceback object and spit out a dict, and the reverse? Also, since it would be a completely separate piece of code I feel it should go in a separate PR, do you agree?

ionelmc commented 5 years ago

Why wouldn't six.reraise continue to work?

Regarding the dict api, ignore what I asked for now. Maybe it's not needed at all.

crusaderky commented 5 years ago

It would continue to work, but it would not be necessary anymore - just raise will do fine