python / cpython

The Python programming language
https://www.python.org/
Other
60.55k stars 29.26k forks source link

pickle.py is limited by python's call stack #47369

Open 0ba59aa5-2e90-422d-94d1-15d2602a5498 opened 16 years ago

0ba59aa5-2e90-422d-94d1-15d2602a5498 commented 16 years ago
BPO 3119
Nosy @jcea, @pitrou, @avassalotti, @serhiy-storchaka, @matrixise
Files
  • pickle.patch: svn diff output against the python trunk.
  • pickle2.patch: svn diff output against the python trunk, take two.
  • pickle3.patch
  • pickle4.patch
  • graphtest.py: Example code triggering the issue
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.7', 'type-bug', 'library'] title = "pickle.py is limited by python's call stack" updated_at = user = 'https://bugs.python.org/habnabit' ``` bugs.python.org fields: ```python activity = actor = 'habnabit' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'habnabit' dependencies = [] files = ['10638', '11168', '13654', '18073', '43897'] hgrepos = [] issue_num = 3119 keywords = ['patch', 'needs review'] message_count = 17.0 messages = ['68262', '69007', '69118', '69126', '69620', '71510', '85761', '85786', '101451', '101452', '110642', '110834', '268119', '270752', '271394', '278073', '278088'] nosy_count = 9.0 nosy_names = ['jafo', 'jcea', 'pitrou', 'alexandre.vassalotti', 'schmir', 'tleeuwenburg@gmail.com', 'serhiy.storchaka', 'matrixise', 'gavento'] pr_nums = [] priority = 'low' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue3119' versions = ['Python 3.7'] ```

    0ba59aa5-2e90-422d-94d1-15d2602a5498 commented 16 years ago

    Currently, pickle.py in the stdlib is limited by the python call stack. For deeply recursive data structures, the default recursion limit of 1000 is not enough. The patch attached modifies pickle.py to instead use a deque object as a call stack. Pickler.save and other methods that increase the recursion depth are now generators which may yield either another generator or None, where yielding a generator adds it to the call stack.

    pitrou commented 15 years ago

    Interesting. Why do you use a deque? You never seem to prepend or pop from the beginning of it, so using a plain list should be as fast.

    Also, your patch should add one or several unit tests to enforce the new behaviour.

    0ba59aa5-2e90-422d-94d1-15d2602a5498 commented 15 years ago

    Ah, I didn't know that a list would be as fast for appending and popping. I knew that lists were optimized for .append() and .pop(), but I didn't know that a list would be just as fast as a deque if it was just used as a stack.

    And I'll be happy to write unit tests if it can be pointed out to me how exactly they can be written. Should it just test to make sure pickling a deeply nested object hierarchy can be pickled without raising a RuntimeError? I tried to make this as transparent as possible of a change.

    pitrou commented 15 years ago

    Hi,

    Le mercredi 02 juillet 2008 à 20:43 +0000, Aaron Gallagher a écrit :

    Aaron Gallagher \habnabit@gmail.com\ added the comment:

    Ah, I didn't know that a list would be as fast for appending and popping. I knew that lists were optimized for .append() and .pop(), but I didn't know that a list would be just as fast as a deque if it was just used as a stack.

    If you have a few minutes to spare, the best would be to measure the speed of both possibilities, and opt for the fastest.

    And I'll be happy to write unit tests if it can be pointed out to me how exactly they can be written. Should it just test to make sure pickling a deeply nested object hierarchy can be pickled without raising a RuntimeError?

    Yes, exactly. Just create a very deeply nested object (a list should be sufficient I suppose), pickle it, unpickle it, and check that the result is equal to the original. No need to do more. Don't bother catching the eventual RuntimeError, if it is raised the test will be failed anyway.

    Since your patch only regards the pickle module (not cPickle) I suppose the test could go directly into Lib/test/test_pickle.py.

    (oh and don't forget to check the test case fails without your patch :-))

    Thanks !

    avassalotti commented 15 years ago

    On some benchmark of my own, I get a good 2x slowdown when this patch is applied. The benchmark consists of loading \~1.4 millions objects (mostly dict, list, str and int) from a pickle string. It is basically a torture test for the inner function calls overhead.

    Anyway, the slowdown doesn't bother me much. I think a "stackless" pickle module would be well appreciated by many Python users.

    There is a few things that stops me from applying this patch. First, the patch will break subclasses of Pickler that relies on the save() method (Although the method is an implementation detail that shouldn't used). Therefore, any patches that modifies the API of save() is straight out for 2.x. And for Python 3.0, your patch should also remove recursion from the _pickle module (the transparent C accelerator for pickle).

    0ba59aa5-2e90-422d-94d1-15d2602a5498 commented 15 years ago

    Alright, sorry this took so long. Hopefully this can still be included in 3.0. Included is a patch that no longer uses collections.deque and also adds a test case to test/test_pickle.py. The test catches RuntimeError and fails the unittest. I didn't see anything that would behave like the opposite of assertRaises except letting the exception propagate.

    cb10083d-57b2-4c13-be72-1a20fe2a6917 commented 15 years ago

    Aaron,

    Could you please upload another patch against the current trunk, then the issue could be flagged as requiring a patch review?

    Thanks, -Tennessee

    0ba59aa5-2e90-422d-94d1-15d2602a5498 commented 15 years ago

    Okay, here's a new version for the py3k trunk. I'm assuming that this is not going to make it into 2.x at all, because of the API changes. This patch only touches the python version of the code and adds a unit test for testing whether pickle works with arbitrary nesting depth in test.pickletest. The test is disabled for the C pickle tests currently, since it causes python to run out of stack space and crash.

    So, does _pickle.Pickler's API need to be changed as well? Right now, pickle._Pickler.dump expects save (and save expects the other save_* methods) to either return None or an iterable, where the iterable yields either None or further iterables. I'm sure it wouldn't be hard to make _pickle.Pickler.dump work the same way, but since C doesn't have generators, I don't know how exactly the API would translate.

    eacf80b2-9081-4882-a7f5-27e9644c3853 commented 14 years ago

    Sorry for the delay in getting to this patch.

    I've reviewed this patch and it seems fine to me. The only thing I see outstanding is the recommendation made by Alexandre about changing the C module to also implement this.

    Aaron: You may want to take your question on the implementation to the python-dev mailing list: Whether it is necessary to implement it in the C module, and if so suggestions on how or help on doing it. Is that something you can do, Aaron?

    eacf80b2-9081-4882-a7f5-27e9644c3853 commented 14 years ago

    Ugh, I forgot to check the output of my test run before submitting the last reply. With the patch applied, "make test" fails with:

    test_pickle Fatal Python error: Cannot recover from stack overflow. Fatal Python error: Cannot recover from stack overflow. make: *** [test] Aborted (core dumped)

    This is on Python 3 trunk, with the pickle3.patch applied (which applied cleanly).

    For the Misc/NEWS I propose (in the Library section):

    83d2e70e-e599-4a04-b820-3814bbdb9bef commented 13 years ago

    Sean has stated that the patch seems fine, what needs to be done to take this forward?

    0ba59aa5-2e90-422d-94d1-15d2602a5498 commented 13 years ago

    Here's a patch that fixes the unit tests. A new test was added that tried to test the C implementation of this though none exists.

    b17a15b4-f209-43f2-999e-43663d63748e commented 8 years ago

    Hey all,

    I would like to patch the C _pickle module to be non-recursive and help this patch go through. I have an idea on how to do that with a very small amount of changes below and I would like to get feedback and improvements before implementing it in a particular way or getting into details (basically to check if that is acceptable and whether I missed something major).

    Also, what is missing in the Python patch to get it merged?

    == Nonrecursive _pickle:

    First, observe that all the save* recursive calls do not rely on the return value of save except for error checking (which is propagated as -1 to the top). Also, almost all work in save_ is done before calling save() recursively (most of the work after calling save()s is decref or other minor cleanup).

    I would propose a linked list of structures acting as a stack of objects and byte-sequences to be written (for the marks and opcodes). Imagine something like the following (a rough sketch, I need to check all the required info properly later):

    struct _PickleStack {_PickleStack* next; PyObject *obj; char write_opcode; /* ... objects to decref / cleanup after the write */ /* possibly pers_save flags */ };

    The pickle stack would be a part of Pickler and save() would push a value onto that stack instead of recursion. (The order of save() and calls within save_*() needs to be reversed, though.)

    Most objects would push a fixed number of objects to the pickle stack, for lists and dicts I would make pickle stack entries for all the items at once (e.g. as in batch_list()). An alternative would be to store the last item and the open iterator, but that seems wasteful.

    This should not slow things down significantly (if at all), as the only overhead is having own stack of small structures (likely less than stackframe size of save_*). If frequent de/allocation will be a problem, I would modify the allocation strategy later (to blocks or such).

    Are there any issues with changing the working of save() internally? The functionality of reduce, custom Picklers, persistent ids and dispatch tables should stay the same. The unpickling is not recursive and needs no changes (and has its own custom stack impl).

    Looking forward to your comments.

    matrixise commented 7 years ago

    hello, I think we can close this issue because Python 3.1 is not supported. or we keep it if this issue is confirmed for Python 3.6.

    b17a15b4-f209-43f2-999e-43663d63748e commented 7 years ago

    The issue is still present in Python 2.7.12 and Python 3.5.2, and the implementation has not been changed in the master branch either. You can test it with the attached program constructing a graph (simplified, but a realistic application), or with the following obviously deep construction:

    import pickle, _pickle
    a=()
    for i in range(1000): a = (a,)
    pickle.dumps(a) # or _pickle.dumps(a)

    Any further comments or thoughts on the solution?

    serhiy-storchaka commented 7 years ago

    It would be nice to support unlimitedly nested structures. C stack is more hard limit than Python stack. But the code of the pickle module (especially C implementation) is complicated and hardly optimized. I think it would be not easy to implement stackless version without significant loss of speed and readability. You can try, but I don't have much hope.

    0ba59aa5-2e90-422d-94d1-15d2602a5498 commented 7 years ago

    Definitely not interested in pickle at all anymore.