python / cpython

The Python programming language
https://www.python.org
Other
62.83k stars 30.09k forks source link

Segfault provoked by generators and exceptions #44139

Closed 4463b2f2-93ac-426d-8cf7-70ca0b641eba closed 17 years ago

4463b2f2-93ac-426d-8cf7-70ca0b641eba commented 17 years ago
BPO 1579370
Nosy @mwhudson, @tim-one, @loewis
Files
  • gen.diff
  • hope.py: quick-failing (in Windows debug build)
  • tstate.diff: Quick & dirty fix
  • tr.diff: eliminate usage of f_tstate in PyTraceBack_Here
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core'] title = 'Segfault provoked by generators and exceptions' updated_at = user = 'https://bugs.python.org/klaas' ``` bugs.python.org fields: ```python activity = actor = 'loewis' assignee = 'none' closed = True closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'klaas' dependencies = [] files = ['2182', '2183', '2184', '2185'] hgrepos = [] issue_num = 1579370 keywords = [] message_count = 19.0 messages = ['30272', '30273', '30274', '30275', '30276', '30277', '30278', '30279', '30280', '30281', '30282', '30283', '30284', '30285', '30286', '30287', '30288', '30289', '30290'] nosy_count = 7.0 nosy_names = ['mwh', 'tim.peters', 'loewis', 'nnorwitz', 'awaters', 'klaas', 'eric_noyau'] pr_nums = [] priority = 'critical' resolution = 'fixed' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue1579370' versions = ['Python 2.5'] ```

    4463b2f2-93ac-426d-8cf7-70ca0b641eba commented 17 years ago

    A reproducible segfault when using heavily-nested generators and exceptions.

    Unfortunately, I haven't yet been able to provoke this behaviour with a standalone python2.5 script. There are, however, no third-party c extensions running in the process so I'm fairly confident that it is a problem in the core.

    The gist of the code is a series of nested generators which leave scope when an exception is raised. This exception is caught and re-raised in an outer loop. The old exception was holding on to the frame which was keeping the generators alive, and the sequence of generator destruction and new finalization caused the segfault.

    4463b2f2-93ac-426d-8cf7-70ca0b641eba commented 17 years ago

    Logged In: YES user_id=1611720

    Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208400192 (LWP 26235)] 0x080e4296 in PyTraceBack_Here (frame=0x9c2d7b4) at Python/traceback.c:94 94 if ((next != NULL && !PyTraceBack_Check(next)) || (gdb) bt

    0 0x080e4296 in PyTraceBack_Here (frame=0x9c2d7b4) at

    Python/traceback.c:94

    1 0x080b9ab7 in PyEval_EvalFrameEx (f=0x9c2d7b4,

    throwflag=1) at Python/ceval.c:2459

    2 0x08101a40 in gen_send_ex (gen=0xb64f880c,

    arg=0x81333e0, exc=1) at Objects/genobject.c:82

    3 0x08101c0f in gen_close (gen=0xb64f880c, args=0x0) at

    Objects/genobject.c:128

    4 0x08101cde in gen_del (self=0xb64f880c) at

    Objects/genobject.c:163

    5 0x0810195b in gen_dealloc (gen=0xb64f880c) at

    Objects/genobject.c:31

    6 0x080b9912 in PyEval_EvalFrameEx (f=0x9c2802c,

    throwflag=1) at Python/ceval.c:2491

    7 0x08101a40 in gen_send_ex (gen=0xb64f362c,

    arg=0x81333e0, exc=1) at Objects/genobject.c:82

    8 0x08101c0f in gen_close (gen=0xb64f362c, args=0x0) at

    Objects/genobject.c:128

    9 0x08101cde in gen_del (self=0xb64f362c) at

    Objects/genobject.c:163

    10 0x0810195b in gen_dealloc (gen=0xb64f362c) at

    Objects/genobject.c:31

    11 0x080815b9 in dict_dealloc (mp=0xb64f4a44) at

    Objects/dictobject.c:801

    12 0x080927b2 in subtype_dealloc (self=0xb64f340c) at

    Objects/typeobject.c:686

    13 0x0806028d in instancemethod_dealloc (im=0xb796a0cc) at

    Objects/classobject.c:2285

    14 0x080815b9 in dict_dealloc (mp=0xb64f78ac) at

    Objects/dictobject.c:801

    15 0x080927b2 in subtype_dealloc (self=0xb64f810c) at

    Objects/typeobject.c:686

    16 0x081028c5 in frame_dealloc (f=0x9c272bc) at

    Objects/frameobject.c:416

    17 0x080e41b1 in tb_dealloc (tb=0xb799166c) at

    Python/traceback.c:34

    18 0x080e41c2 in tb_dealloc (tb=0xb4071284) at

    Python/traceback.c:33

    19 0x080e41c2 in tb_dealloc (tb=0xb7991824) at

    Python/traceback.c:33

    20 0x08080dca in insertdict (mp=0xb7f56824, key=0xb3fb9930,

    hash=1492466088, value=0xb3fb9914) at Objects/dictobject.c:394

    21 0x080811a4 in PyDict_SetItem (op=0xb7f56824,

    key=0xb3fb9930, value=0xb3fb9914) at Objects/dictobject.c:619

    22 0x08082dc6 in PyDict_SetItemString (v=0xb7f56824,

    key=0x8129284 "exc_traceback", item=0xb3fb9914) at Objects/dictobject.c:2103

    23 0x080e2837 in PySys_SetObject (name=0x8129284

    "exc_traceback", v=0xb3fb9914) at Python/sysmodule.c:82

    24 0x080bc9e5 in PyEval_EvalFrameEx (f=0x9c10e7c,

    throwflag=0) at Python/ceval.c:2954

    25 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7bbc890,

    globals=0xb7bbe57c, locals=0x0, args=0x9b8e2ac, argcount=1, kws=0x9b8e2b0, kwcount=0, defs=0xb7b7aed8, defcount=1, closure=0x0) at Python/ceval.c:2833

    26 0x080bd62a in PyEval_EvalFrameEx (f=0x9b8e16c,

    throwflag=0) at Python/ceval.c:3662

    27 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7bbc848,

    globals=0xb7bbe57c, locals=0x0, args=0xb7af9d58, argcount=1, kws=0x9b7a818, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833

    28 0x08104083 in function_call (func=0xb7b79c34,

    arg=0xb7af9d4c, kw=0xb7962c64) at Objects/funcobject.c:517

    29 0x0805a660 in PyObject_Call (func=0xb7b79c34,

    arg=0xb7af9d4c, kw=0xb7962c64) at Objects/abstract.c:1860

    30 0x080bcb4b in PyEval_EvalFrameEx (f=0x9b82c0c,

    throwflag=0) at Python/ceval.c:3846

    31 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7cd6608,

    globals=0xb7cd4934, locals=0x0, args=0x9b7765c, argcount=2, kws=0x9b77664, kwcount=0, defs=0x0, defcount=0, closure=0xb7cfe874) at Python/ceval.c:2833

    32 0x080bd62a in PyEval_EvalFrameEx (f=0x9b7751c,

    throwflag=0) at Python/ceval.c:3662

    33 0x080bdf70 in PyEval_EvalFrameEx (f=0x9a9646c,

    throwflag=0) at Python/ceval.c:3652

    34 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f39728,

    globals=0xb7f6ca44, locals=0x0, args=0x9b7a00c, argcount=0, kws=0x9b7a00c, kwcount=0, defs=0x0, defcount=0, closure=0xb796410c) at Python/ceval.c:2833

    35 0x080bd62a in PyEval_EvalFrameEx (f=0x9b79ebc,

    throwflag=0) at Python/ceval.c:3662

    36 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f39770,

    globals=0xb7f6ca44, locals=0x0, args=0x99086c0, argcount=0, kws=0x99086c0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833

    37 0x080bd62a in PyEval_EvalFrameEx (f=0x9908584,

    throwflag=0) at Python/ceval.c:3662

    38 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f397b8,

    globals=0xb7f6ca44, locals=0xb7f6ca44, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833 ---Type \<return> to continue, or q \<return> to quit---

    #39 0x080bff32 in PyEval_EvalCode (co=0xb7f397b8,
    globals=0xb7f6ca44, locals=0xb7f6ca44) at Python/ceval.c:494
    #40 0x080ddff1 in PyRun_FileExFlags (fp=0x98a4008,
    filename=0xbfffd4a3 "scoreserver.py", start=257,
        globals=0xb7f6ca44, locals=0xb7f6ca44, closeit=1,
    flags=0xbfffd298) at Python/pythonrun.c:1264
    #41 0x080de321 in PyRun_SimpleFileExFlags (fp=Variable "fp"
    is not available.
    ) at Python/pythonrun.c:870
    #42 0x08056ac4 in Py_Main (argc=1, argv=0xbfffd334) at
    Modules/main.c:496
    #43 0x00a69d5f in __libc_start_main () from /lib/libc.so.6
    #44 0x08056051 in _start ()
    4463b2f2-93ac-426d-8cf7-70ca0b641eba commented 17 years ago

    Logged In: YES user_id=1611720

    I've produced a simplified traceback with a single generator . Note the frame being used in the traceback (#0) is the same frame being dealloc'd (#11).

    The relevant call in traceback.c is: PyTraceBack_Here(PyFrameObject *frame) { PyThreadState *tstate = frame->f_tstate; PyTracebackObject *oldtb = (PyTracebackObject *) tstate->curexc_traceback; PyTracebackObject *tb = newtracebackobject(oldtb, frame);

    and I can verify that oldtb contains garbage: (gdb) print frame $1 = (PyFrameObject *) 0x8964d94 (gdb) print frame->f_tstate $2 = (PyThreadState *) 0x895b178 (gdb) print $2->curexc_traceback $3 = (PyObject *) 0x66

    0 0x080e4296 in PyTraceBack_Here (frame=0x8964d94) at

    Python/traceback.c:94

    1 0x080b9ab7 in PyEval_EvalFrameEx (f=0x8964d94,

    throwflag=1) at Python/ceval.c:2459

    2 0x08101a40 in gen_send_ex (gen=0xb7cca4ac,

    arg=0x81333e0, exc=1) at Objects/genobject.c:82

    3 0x08101c0f in gen_close (gen=0xb7cca4ac, args=0x0) at

    Objects/genobject.c:128

    4 0x08101cde in gen_del (self=0xb7cca4ac) at

    Objects/genobject.c:163

    5 0x0810195b in gen_dealloc (gen=0xb7cca4ac) at

    Objects/genobject.c:31

    6 0x080815b9 in dict_dealloc (mp=0xb7cc913c) at

    Objects/dictobject.c:801

    7 0x080927b2 in subtype_dealloc (self=0xb7cca76c) at

    Objects/typeobject.c:686

    8 0x0806028d in instancemethod_dealloc (im=0xb7d07f04) at

    Objects/classobject.c:2285

    9 0x080815b9 in dict_dealloc (mp=0xb7cc90b4) at

    Objects/dictobject.c:801

    10 0x080927b2 in subtype_dealloc (self=0xb7cca86c) at

    Objects/typeobject.c:686

    11 0x081028c5 in frame_dealloc (f=0x8964a94) at

    Objects/frameobject.c:416

    12 0x080e41b1 in tb_dealloc (tb=0xb7cc1fcc) at

    Python/traceback.c:34

    13 0x080e41c2 in tb_dealloc (tb=0xb7cc1f7c) at

    Python/traceback.c:33

    14 0x08080dca in insertdict (mp=0xb7f99824, key=0xb7ccd020,

    hash=1492466088, value=0xb7ccd054) at Objects/dictobject.c:394

    15 0x080811a4 in PyDict_SetItem (op=0xb7f99824,

    key=0xb7ccd020, value=0xb7ccd054) at Objects/dictobject.c:619

    16 0x08082dc6 in PyDict_SetItemString (v=0xb7f99824,

    key=0x8129284 "exc_traceback", item=0xb7ccd054) at Objects/dictobject.c:2103

    17 0x080e2837 in PySys_SetObject (name=0x8129284

    "exc_traceback", v=0xb7ccd054) at Python/sysmodule.c:82

    18 0x080bc9e5 in PyEval_EvalFrameEx (f=0x895f934,

    throwflag=0) at Python/ceval.c:2954 ---Type \<return> to continue, or q \<return> to quit---

    #19 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f6ade8,
    globals=0xb7fafa44, locals=0x0, 
        args=0xb7cc5ff8, argcount=1, kws=0x0, kwcount=0,
    defs=0x0, defcount=0, closure=0x0)
        at Python/ceval.c:2833
    #20 0x08104083 in function_call (func=0xb7cc7294,
    arg=0xb7cc5fec, kw=0x0)
        at Objects/funcobject.c:517
    #21 0x0805a660 in PyObject_Call (func=0xb7cc7294,
    arg=0xb7cc5fec, kw=0x0)
        at Objects/abstract.c:1860
    4463b2f2-93ac-426d-8cf7-70ca0b641eba commented 17 years ago

    Logged In: YES user_id=1611720

    I cannot yet produce an only-python script which reproduces the problem, but I can give an overview. There is a generator running in one thread, an exception being raised in another thread, and as a consequent, the generator in the first thread is garbage-collected (triggering an exception due to the new generator cleanup). The problem is extremely sensitive to timing--often the insertion/removal of print statements, or reordering the code, causes the problem to vanish, which is confounding my ability to create a simple test script.

    def getdocs():
        def f():
            <some somehwat time-consuming operation>
        while True:
            f()
            yield None

    # -----------------------------------------------------------------------------

    class B(object):
        def __init__(self,):
            pass
        def doit(self):
            # must be an instance var to trigger segfault
            self.docIter = getdocs()
            print self.docIter # this is the generator
    referred-to in the traceback
            for i, item in enumerate(self.docIter):            
                if i > 9:
                    break            
            print 'exiting generator'
    
    class A(object):
        """ Process entry point / main thread """
        def __init__(self):
    
            while True:
                try:
                    self.func()
                except Exception, e:
                    print 'right after raise'
    
        def func(self):        
            b = B()
            thread = threading.Thread(target=b.doit)
            thread.start()
            start_t = time.time()
            while True:
                try:
                    if time.time() - start_t > 1:
                        raise Exception
                except Exception:
                    print 'right before raise'
                    # SIGSEGV here.  If this is changed to
                    # 'break', no segfault occurs
                    raise
    
    if __name__ == '__main__':
        A()
    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 17 years ago

    Logged In: YES user_id=21627

    Can you please review/try attached patch? Can anybody tell why gi_frame *isn't* incref'ed when the generator is created?

    tim-one commented 17 years ago

    Logged In: YES user_id=31435

    Can anybody tell why gi_frame *isn't* incref'ed when the generator is created?

    As documented (in concrete.tex), PyGen_New(f) steals a reference to the frame passed to it. Its only call site (well, in the core) is in ceval.c, which returns immediately after PyGen_New takes over ownership of the frame the caller created:

    """ /* Create a new generator that owns the ready to run frame

    In short, that PyGen_New() doesn't incref the frame passed to it is intentional.

    It's possible that the intent is flawed ;-), but offhand I don't see how.

    4463b2f2-93ac-426d-8cf7-70ca0b641eba commented 17 years ago

    Logged In: YES user_id=1611720

    Despite Tim's reassurrance, I'm afraid that Martin's patch does infact prevent the segfault. Sounds like it also introduces a memleak.

    tim-one commented 17 years ago

    Logged In: YES user_id=31435

    I've attached a much simplified pure-Python script (hope.py) that reproduces a problem very quickly, on Windows, in a /debug/ build of current trunk. It typically prints:

    exiting generator joined thread

    at most twice before crapping out. At the time, the next argument to newtracebackobject() is 0xdddddddd, and tracing back a level shows that, in PyTraceBack_Here(), frame->tstate is entirely filled with 0xdd bytes.

    Note that this is not a debug-build obmalloc gimmick! This is Microsoft's similar debug-build gimmick for their malloc, and for some reason Python uses the system malloc directly to obtain memory for thread states. The Microsoft debug free() fills newly-freed memory with 0xdd, which has the same meaning as the debug-build obmalloc's DEADBYTE (0xdb).

    So somebody is accessing a thread state here after it's been freed. Best guess is that the generator is getting "cleaned up" after the thread that created it has gone away, so the generator's frame's f_tstate is trash.

    Note that a PyThreadState (a frame's f_tstate) is /not/ a Python object -- it's just a raw C struct, and its lifetime isn't controlled by refcounts.

    mwhudson commented 17 years ago

    Logged In: YES user_id=6656

    and for some reason Python uses the system malloc directly to obtain memory for thread states.

    This bit is fairly easy: they are allocated without the GIL being held, which breaks an assumption of PyMalloc.

    No idea about the real problem, sadly.

    d21744ff-f396-4c71-955e-7dbd2e886779 commented 17 years ago

    Logged In: YES user_id=33168

    Mike, what platform are you having the problem on?

    I tried Tim's hope.py on Linux x86_64 and Mac OS X 10.4 with debug builds and neither one crashed. Tim's guess looks pretty damn good too. Here's the result of valgrind:

    Invalid read of size 8

    at 0x4CEBFE: PyTraceBack_Here (traceback.c:117)

    by 0x49C1F1: PyEval_EvalFrameEx (ceval.c:2515)

    by 0x4F615D: gen_send_ex (genobject.c:82)

    by 0x4F6326: gen_close (genobject.c:128)

    by 0x4F645E: gen_del (genobject.c:163)

    by 0x4F5F00: gen_dealloc (genobject.c:31)

    by 0x44D207: _Py_Dealloc (object.c:1928)

    by 0x44534E: dict_dealloc (dictobject.c:801)

    by 0x44D207: _Py_Dealloc (object.c:1928)

    by 0x4664FF: subtype_dealloc (typeobject.c:686)

    by 0x44D207: _Py_Dealloc (object.c:1928)

    by 0x42325D: instancemethod_dealloc (classobject.c:2287)

    Address 0x56550C0 is 88 bytes inside a block of size 152 free'd
    at 0x4A1A828: free (vg_replace_malloc.c:233)

    by 0x4C3899: tstate_delete_common (pystate.c:256)

    by 0x4C3926: PyThreadState_DeleteCurrent (pystate.c:282)

    by 0x4D4043: t_bootstrap (threadmodule.c:448)

    by 0x4B24C48: pthread_start_thread (in /lib/libpthread-0.10.so)

    The only way I can think to fix this is to keep a set of active generators in the PyThreadState and calling gen_send_ex(exc=1) for all the active generators before killing the tstate in t_bootstrap.

    tim-one commented 17 years ago

    Logged In: YES user_id=31435

    I tried Tim's hope.py on Linux x86_64 and Mac OS X 10.4 with debug builds and neither one crashed. Tim's guess looks pretty damn good too.

    Neal, note that it's the /Windows/ malloc that fills freed memory with "dangerous bytes" in a debug build -- this really has nothing to do with that it's a debug build of /Python/ apart from that on Windows a debug build of Python also links in the debug version of Microsoft's malloc.

    The valgrind report is pointing at the same thing. Whether this leads to a crash is purely an accident of when and how the system malloc happens to reuse the freed memory.

    460b6fbd-6650-4b32-b8c8-f42136d9d2fa commented 17 years ago

    We are experiencing the same segfault in our application, reliably. Running our unit test suite just segfault everytime on both Linux and Mac OS X. Applying Martin's patch fixes the segfault, and makes everything fine and dandy, at the cost of some memory leaks if I understand properly.

    This particular bug prevents us to upgrade to python 2.5 in production.

    4463b2f2-93ac-426d-8cf7-70ca0b641eba commented 17 years ago

    The following patch resets the thread state of the generator when it is resumed, which prevents the segfault for me:

    Index: Objects/genobject.c \===================================================================

    --- Objects/genobject.c (revision 52849)
    +++ Objects/genobject.c (working copy)
    @@ -77,6 +77,7 @@
            Py_XINCREF(tstate->frame);
            assert(f->f_back == NULL);
            f->f_back = tstate->frame;
    +        f->f_tstate = tstate;
    
            gen->gi_running = 1;
            result = PyEval_EvalFrameEx(f, exc);
    87aadeff-db2d-4143-a83e-7a83177a7f25 commented 17 years ago

    This fixes the segfault problem that I was able to reliably reproduce on Linux.

    We need to get this applied (assuming it is the correct fix) to the source to make Python 2.5 usable for me in production code.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 17 years ago

    Why do frame objects have a thread state in the first place? In particular, why does PyTraceBack_Here get the thread state from the frame, instead of using the current thread?

    Introduction of f_tstate goes back to r7882, but it is not clear why it was done that way.

    d21744ff-f396-4c71-955e-7dbd2e886779 commented 17 years ago

    Bumping priority to see if this should go into 2.5.1.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 17 years ago

    I don't like mklaas' patch, since I think it is conceptually wrong to have PyTraceBack_Here() use the frame's thread state (mklaas describes it as dirty, and I agree). I'm proposing an alternative patch (tr.diff); please test this as well. File Added: tr.diff

    87aadeff-db2d-4143-a83e-7a83177a7f25 commented 17 years ago

    A quick test on code that always segfaulted with unpatched Python 2.5 seems to work. Needs more extensive testing...

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 17 years ago

    This is now fixed in r53531 and r53532. For the trunk, it is likely that f_tstate will get eliminated altogether in the near future. People who had the problem are really encouraged to test 2.5.1c1 when it is released.