Closed 4463b2f2-93ac-426d-8cf7-70ca0b641eba closed 17 years ago
A reproducible segfault when using heavily-nested generators and exceptions.
Unfortunately, I haven't yet been able to provoke this behaviour with a standalone python2.5 script. There are, however, no third-party c extensions running in the process so I'm fairly confident that it is a problem in the core.
The gist of the code is a series of nested generators which leave scope when an exception is raised. This exception is caught and re-raised in an outer loop. The old exception was holding on to the frame which was keeping the generators alive, and the sequence of generator destruction and new finalization caused the segfault.
Logged In: YES user_id=1611720
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208400192 (LWP 26235)] 0x080e4296 in PyTraceBack_Here (frame=0x9c2d7b4) at Python/traceback.c:94 94 if ((next != NULL && !PyTraceBack_Check(next)) || (gdb) bt
throwflag=1) at Python/ceval.c:2459
arg=0x81333e0, exc=1) at Objects/genobject.c:82
throwflag=1) at Python/ceval.c:2491
arg=0x81333e0, exc=1) at Objects/genobject.c:82
hash=1492466088, value=0xb3fb9914) at Objects/dictobject.c:394
key=0xb3fb9930, value=0xb3fb9914) at Objects/dictobject.c:619
key=0x8129284 "exc_traceback", item=0xb3fb9914) at Objects/dictobject.c:2103
"exc_traceback", v=0xb3fb9914) at Python/sysmodule.c:82
throwflag=0) at Python/ceval.c:2954
globals=0xb7bbe57c, locals=0x0, args=0x9b8e2ac, argcount=1, kws=0x9b8e2b0, kwcount=0, defs=0xb7b7aed8, defcount=1, closure=0x0) at Python/ceval.c:2833
throwflag=0) at Python/ceval.c:3662
globals=0xb7bbe57c, locals=0x0, args=0xb7af9d58, argcount=1, kws=0x9b7a818, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833
arg=0xb7af9d4c, kw=0xb7962c64) at Objects/funcobject.c:517
arg=0xb7af9d4c, kw=0xb7962c64) at Objects/abstract.c:1860
throwflag=0) at Python/ceval.c:3846
globals=0xb7cd4934, locals=0x0, args=0x9b7765c, argcount=2, kws=0x9b77664, kwcount=0, defs=0x0, defcount=0, closure=0xb7cfe874) at Python/ceval.c:2833
throwflag=0) at Python/ceval.c:3662
throwflag=0) at Python/ceval.c:3652
globals=0xb7f6ca44, locals=0x0, args=0x9b7a00c, argcount=0, kws=0x9b7a00c, kwcount=0, defs=0x0, defcount=0, closure=0xb796410c) at Python/ceval.c:2833
throwflag=0) at Python/ceval.c:3662
globals=0xb7f6ca44, locals=0x0, args=0x99086c0, argcount=0, kws=0x99086c0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833
throwflag=0) at Python/ceval.c:3662
globals=0xb7f6ca44, locals=0xb7f6ca44, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833 ---Type \<return> to continue, or q \<return> to quit---
#39 0x080bff32 in PyEval_EvalCode (co=0xb7f397b8,
globals=0xb7f6ca44, locals=0xb7f6ca44) at Python/ceval.c:494
#40 0x080ddff1 in PyRun_FileExFlags (fp=0x98a4008,
filename=0xbfffd4a3 "scoreserver.py", start=257,
globals=0xb7f6ca44, locals=0xb7f6ca44, closeit=1,
flags=0xbfffd298) at Python/pythonrun.c:1264
#41 0x080de321 in PyRun_SimpleFileExFlags (fp=Variable "fp"
is not available.
) at Python/pythonrun.c:870
#42 0x08056ac4 in Py_Main (argc=1, argv=0xbfffd334) at
Modules/main.c:496
#43 0x00a69d5f in __libc_start_main () from /lib/libc.so.6
#44 0x08056051 in _start ()
Logged In: YES user_id=1611720
I've produced a simplified traceback with a single generator . Note the frame being used in the traceback (#0) is the same frame being dealloc'd (#11).
The relevant call in traceback.c is: PyTraceBack_Here(PyFrameObject *frame) { PyThreadState *tstate = frame->f_tstate; PyTracebackObject *oldtb = (PyTracebackObject *) tstate->curexc_traceback; PyTracebackObject *tb = newtracebackobject(oldtb, frame);
and I can verify that oldtb contains garbage: (gdb) print frame $1 = (PyFrameObject *) 0x8964d94 (gdb) print frame->f_tstate $2 = (PyThreadState *) 0x895b178 (gdb) print $2->curexc_traceback $3 = (PyObject *) 0x66
throwflag=1) at Python/ceval.c:2459
arg=0x81333e0, exc=1) at Objects/genobject.c:82
hash=1492466088, value=0xb7ccd054) at Objects/dictobject.c:394
key=0xb7ccd020, value=0xb7ccd054) at Objects/dictobject.c:619
key=0x8129284 "exc_traceback", item=0xb7ccd054) at Objects/dictobject.c:2103
"exc_traceback", v=0xb7ccd054) at Python/sysmodule.c:82
throwflag=0) at Python/ceval.c:2954 ---Type \<return> to continue, or q \<return> to quit---
#19 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f6ade8,
globals=0xb7fafa44, locals=0x0,
args=0xb7cc5ff8, argcount=1, kws=0x0, kwcount=0,
defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:2833
#20 0x08104083 in function_call (func=0xb7cc7294,
arg=0xb7cc5fec, kw=0x0)
at Objects/funcobject.c:517
#21 0x0805a660 in PyObject_Call (func=0xb7cc7294,
arg=0xb7cc5fec, kw=0x0)
at Objects/abstract.c:1860
Logged In: YES user_id=1611720
I cannot yet produce an only-python script which reproduces the problem, but I can give an overview. There is a generator running in one thread, an exception being raised in another thread, and as a consequent, the generator in the first thread is garbage-collected (triggering an exception due to the new generator cleanup). The problem is extremely sensitive to timing--often the insertion/removal of print statements, or reordering the code, causes the problem to vanish, which is confounding my ability to create a simple test script.
def getdocs():
def f():
<some somehwat time-consuming operation>
while True:
f()
yield None
# -----------------------------------------------------------------------------
class B(object):
def __init__(self,):
pass
def doit(self):
# must be an instance var to trigger segfault
self.docIter = getdocs()
print self.docIter # this is the generator
referred-to in the traceback
for i, item in enumerate(self.docIter):
if i > 9:
break
print 'exiting generator'
class A(object):
""" Process entry point / main thread """
def __init__(self):
while True:
try:
self.func()
except Exception, e:
print 'right after raise'
def func(self):
b = B()
thread = threading.Thread(target=b.doit)
thread.start()
start_t = time.time()
while True:
try:
if time.time() - start_t > 1:
raise Exception
except Exception:
print 'right before raise'
# SIGSEGV here. If this is changed to
# 'break', no segfault occurs
raise
if __name__ == '__main__':
A()
Logged In: YES user_id=21627
Can you please review/try attached patch? Can anybody tell why gi_frame *isn't* incref'ed when the generator is created?
Logged In: YES user_id=31435
Can anybody tell why gi_frame *isn't* incref'ed when the generator is created?
As documented (in concrete.tex), PyGen_New(f) steals a reference to the frame passed to it. Its only call site (well, in the core) is in ceval.c, which returns immediately after PyGen_New takes over ownership of the frame the caller created:
""" /* Create a new generator that owns the ready to run frame
In short, that PyGen_New() doesn't incref the frame passed to it is intentional.
It's possible that the intent is flawed ;-), but offhand I don't see how.
Logged In: YES user_id=1611720
Despite Tim's reassurrance, I'm afraid that Martin's patch does infact prevent the segfault. Sounds like it also introduces a memleak.
Logged In: YES user_id=31435
I've attached a much simplified pure-Python script (hope.py) that reproduces a problem very quickly, on Windows, in a /debug/ build of current trunk. It typically prints:
exiting generator joined thread
at most twice before crapping out. At the time, the next
argument to newtracebackobject() is 0xdddddddd, and tracing
back a level shows that, in PyTraceBack_Here(),
frame->tstate is entirely filled with 0xdd bytes.
Note that this is not a debug-build obmalloc gimmick! This is Microsoft's similar debug-build gimmick for their malloc, and for some reason Python uses the system malloc directly to obtain memory for thread states. The Microsoft debug free() fills newly-freed memory with 0xdd, which has the same meaning as the debug-build obmalloc's DEADBYTE (0xdb).
So somebody is accessing a thread state here after it's been freed. Best guess is that the generator is getting "cleaned up" after the thread that created it has gone away, so the generator's frame's f_tstate is trash.
Note that a PyThreadState (a frame's f_tstate) is /not/ a Python object -- it's just a raw C struct, and its lifetime isn't controlled by refcounts.
Logged In: YES user_id=6656
and for some reason Python uses the system malloc directly to obtain memory for thread states.
This bit is fairly easy: they are allocated without the GIL being held, which breaks an assumption of PyMalloc.
No idea about the real problem, sadly.
Logged In: YES user_id=33168
Mike, what platform are you having the problem on?
I tried Tim's hope.py on Linux x86_64 and Mac OS X 10.4 with debug builds and neither one crashed. Tim's guess looks pretty damn good too. Here's the result of valgrind:
Invalid read of size 8
at 0x4CEBFE: PyTraceBack_Here (traceback.c:117)
by 0x49C1F1: PyEval_EvalFrameEx (ceval.c:2515)
by 0x4F615D: gen_send_ex (genobject.c:82)
by 0x4F6326: gen_close (genobject.c:128)
by 0x4F645E: gen_del (genobject.c:163)
by 0x4F5F00: gen_dealloc (genobject.c:31)
by 0x44D207: _Py_Dealloc (object.c:1928)
by 0x44534E: dict_dealloc (dictobject.c:801)
by 0x44D207: _Py_Dealloc (object.c:1928)
by 0x4664FF: subtype_dealloc (typeobject.c:686)
by 0x44D207: _Py_Dealloc (object.c:1928)
by 0x42325D: instancemethod_dealloc (classobject.c:2287)
Address 0x56550C0 is 88 bytes inside a block of size 152
free'd
at 0x4A1A828: free (vg_replace_malloc.c:233)
by 0x4C3899: tstate_delete_common (pystate.c:256)
by 0x4C3926: PyThreadState_DeleteCurrent (pystate.c:282)
by 0x4D4043: t_bootstrap (threadmodule.c:448)
by 0x4B24C48: pthread_start_thread (in /lib/libpthread-0.10.so)
The only way I can think to fix this is to keep a set of active generators in the PyThreadState and calling gen_send_ex(exc=1) for all the active generators before killing the tstate in t_bootstrap.
Logged In: YES user_id=31435
I tried Tim's hope.py on Linux x86_64 and Mac OS X 10.4 with debug builds and neither one crashed. Tim's guess looks pretty damn good too.
Neal, note that it's the /Windows/ malloc that fills freed memory with "dangerous bytes" in a debug build -- this really has nothing to do with that it's a debug build of /Python/ apart from that on Windows a debug build of Python also links in the debug version of Microsoft's malloc.
The valgrind report is pointing at the same thing. Whether this leads to a crash is purely an accident of when and how the system malloc happens to reuse the freed memory.
We are experiencing the same segfault in our application, reliably. Running our unit test suite just segfault everytime on both Linux and Mac OS X. Applying Martin's patch fixes the segfault, and makes everything fine and dandy, at the cost of some memory leaks if I understand properly.
This particular bug prevents us to upgrade to python 2.5 in production.
The following patch resets the thread state of the generator when it is resumed, which prevents the segfault for me:
Index: Objects/genobject.c \===================================================================
--- Objects/genobject.c (revision 52849)
+++ Objects/genobject.c (working copy)
@@ -77,6 +77,7 @@
Py_XINCREF(tstate->frame);
assert(f->f_back == NULL);
f->f_back = tstate->frame;
+ f->f_tstate = tstate;
gen->gi_running = 1;
result = PyEval_EvalFrameEx(f, exc);
This fixes the segfault problem that I was able to reliably reproduce on Linux.
We need to get this applied (assuming it is the correct fix) to the source to make Python 2.5 usable for me in production code.
Why do frame objects have a thread state in the first place? In particular, why does PyTraceBack_Here get the thread state from the frame, instead of using the current thread?
Introduction of f_tstate goes back to r7882, but it is not clear why it was done that way.
Bumping priority to see if this should go into 2.5.1.
I don't like mklaas' patch, since I think it is conceptually wrong to have PyTraceBack_Here() use the frame's thread state (mklaas describes it as dirty, and I agree). I'm proposing an alternative patch (tr.diff); please test this as well. File Added: tr.diff
A quick test on code that always segfaulted with unpatched Python 2.5 seems to work. Needs more extensive testing...
This is now fixed in r53531 and r53532. For the trunk, it is likely that f_tstate will get eliminated altogether in the near future. People who had the problem are really encouraged to test 2.5.1c1 when it is released.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['interpreter-core']
title = 'Segfault provoked by generators and exceptions'
updated_at =
user = 'https://bugs.python.org/klaas'
```
bugs.python.org fields:
```python
activity =
actor = 'loewis'
assignee = 'none'
closed = True
closed_date = None
closer = None
components = ['Interpreter Core']
creation =
creator = 'klaas'
dependencies = []
files = ['2182', '2183', '2184', '2185']
hgrepos = []
issue_num = 1579370
keywords = []
message_count = 19.0
messages = ['30272', '30273', '30274', '30275', '30276', '30277', '30278', '30279', '30280', '30281', '30282', '30283', '30284', '30285', '30286', '30287', '30288', '30289', '30290']
nosy_count = 7.0
nosy_names = ['mwh', 'tim.peters', 'loewis', 'nnorwitz', 'awaters', 'klaas', 'eric_noyau']
pr_nums = []
priority = 'critical'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue1579370'
versions = ['Python 2.5']
```