python-greenlet / greenlet

Lightweight in-process concurrent programming
Other
1.63k stars 247 forks source link

Segfault in ~ThreadStateCreator on shutdown on Python 3.11 #411

Open samschlegel opened 2 months ago

samschlegel commented 2 months ago

We recently upgraded a service from Python 3.8 to 3.11, and from greenlet 1.1.3.post0 to 3.3.0 and have been seeing a decent amount of segfaults during shutdown in the ThreadStateCreator destructor. It seems that this ends up getting called after the interpreter has already shut down which leads to a null pointer in PyEval_AddPendingCall.

I'm working on trying to get a minimal repro and reading through the changes that might have made this happen, but wanted to flag here in case anyone had some ideas on what changes might be leading to this.

As far as fixing it goes, I'm pretty sure it would be safe to check if the interpreter still exists before this call is made, and if so just exit, since presumably there wouldn't be much to cleanup anyways if the interpreter is gone. Without knowing the ultimate cause though, it feels if this is due to some race condition then doing a pre-check would likely just reduce but not eliminate the issue

#0  __pthread_kill_implementation (no_tid=0, signo=11, threadid=138025401046592) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=11, threadid=138025401046592) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=138025401046592, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
#3  0x00007d8a0591a476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
#4  <signal handler called>
#5  0x00000000004e8bc8 in _PyEval_AddPendingCall (interp=0x0, func=0x7d8a046f7540 <greenlet::ThreadState_DestroyNoGIL::DestroyQueueWithGIL(void*)>, arg=0x0) at ../Python/ceval.c:617
#6  0x00007d8a046f80ae in greenlet::ThreadState_DestroyNoGIL::AddPendingCall (arg=0x0, func=0x7d8a046f7540 <greenlet::ThreadState_DestroyNoGIL::DestroyQueueWithGIL(void*)>) at src/greenlet/TThreadStateDestroy.cpp:99
#7  greenlet::ThreadState_DestroyNoGIL::ThreadState_DestroyNoGIL (state=0x7d88d5c3a160, this=<synthetic pointer>) at src/greenlet/TThreadStateDestroy.cpp:147
#8  greenlet::ThreadStateCreator<greenlet::ThreadState_DestroyNoGIL>::~ThreadStateCreator (this=<optimized out>, __in_chrg=<optimized out>) at src/greenlet/greenlet_thread_state.hpp:488
#9  0x00007d8a0591dd9f in __GI___call_tls_dtors () at ./stdlib/cxa_thread_atexit_impl.c:159
#10 0x00007d8a0596c945 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:450
#11 0x00007d8a059fe850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
samschlegel commented 2 months ago

For additional context, here is is the stack trace for the main thread from the same coredump

#0  0x00007d8a059f6a7b in munmap () at ../sysdeps/unix/syscall-template.S:117
#1  0x00007d8a0347676b in mmap_object_dealloc (m_obj=0x7d89c1d79de0) at ./Modules/mmapmodule.c:166
#2  0x000000000058c5d8 in _Py_Dealloc (op=<optimized out>) at ../Objects/object.c:2390
#3  Py_DECREF (op=<optimized out>) at ../Include/object.h:538
#4  Py_XDECREF (op=<optimized out>) at ../Include/object.h:602
#5  _PyObject_FreeInstanceAttributes (self=<Decoder at remote 0x7d89c1dae190>) at ../Objects/dictobject.c:5583
#6  subtype_dealloc (self=<Decoder at remote 0x7d89c1dae190>) at ../Objects/typeobject.c:1433
#7  0x000000000058c1f5 in _Py_Dealloc (op=<optimized out>) at ../Objects/object.c:2390
#8  Py_DECREF (op=<optimized out>) at ../Include/object.h:538
#9  Py_XDECREF (op=<optimized out>) at ../Include/object.h:602
#10 _PyObject_FreeInstanceAttributes (self=<Reader at remote 0x7d89c1d9bcd0>) at ../Objects/dictobject.c:5583
#11 subtype_dealloc (self=<Reader at remote 0x7d89c1d9bcd0>) at ../Objects/typeobject.c:1433
#12 0x000000000052071b in _Py_Dealloc (op=<optimized out>) at ../Objects/object.c:2390
#13 Py_DECREF (op=<optimized out>) at ../Include/object.h:538
#14 Py_XDECREF (op=<optimized out>) at ../Include/object.h:602
#15 insertdict (mp=<optimized out>, key='client', hash=<optimized out>, value=<optimized out>) at ../Objects/dictobject.c:1304
#16 0x00000000005b7e6c in _PyDict_SetItem_Take2 (value=<optimized out>, key=<optimized out>, mp=<optimized out>) at ../Objects/dictobject.c:1886
#17 PyDict_SetItem (value=<optimized out>, key=<optimized out>, op={...(truncated)}) at ../Objects/dictobject.c:1906
#18 _PyModule_ClearDict (d={...(truncated)}) at ../Objects/moduleobject.c:634
#19 0x00000000006442f5 in finalize_modules_clear_weaklist (verbose=0, weaklist=[...(truncated)], interp=0xa4ee58 <_PyRuntime+58936>) at ../Python/pylifecycle.c:1499
#20 finalize_modules (tstate=0xa691d8 <_PyRuntime+166328>) at ../Python/pylifecycle.c:1581
#21 0x0000000000632fbb in Py_FinalizeEx () at ../Python/pylifecycle.c:1833
#22 0x000000000065464c in Py_Exit (sts=0) at ../Python/pylifecycle.c:2940
#23 0x000000000064496f in handle_system_exit () at ../Python/pythonrun.c:771
#24 0x0000000000644756 in _PyErr_PrintEx (set_sys_last_vars=1, tstate=0xa691d8 <_PyRuntime+166328>) at ../Python/pythonrun.c:781
#25 PyErr_PrintEx (set_sys_last_vars=1) at ../Python/pythonrun.c:876
#26 0x000000000046b455 in PyErr_Print () at ../Python/pythonrun.c:882
#27 _PyRun_SimpleFileObject (fp=<optimized out>, filename=<optimized out>, closeit=<optimized out>, flags=0x7fff6a21c938) at ../Python/pythonrun.c:446
#28 0x0000000000643b37 in _PyRun_AnyFileObject (fp=0x217dd20, filename='/usr/local/bin/gunicorn', closeit=1, flags=0x7fff6a21c938) at ../Python/pythonrun.c:79
#29 0x000000000063e679 in pymain_run_file_obj (skip_source_first_line=0, filename='/usr/local/bin/gunicorn', program_name='/usr/bin/python3.11') at ../Modules/main.c:360
#30 pymain_run_file (config=0xa4f220 <_PyRuntime+59904>) at ../Modules/main.c:379
#31 pymain_run_python (exitcode=0x7fff6a21c934) at ../Modules/main.c:601
#32 Py_RunMain () at ../Modules/main.c:680
#33 0x00000000006042bd in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at ../Modules/main.c:734
#34 0x00007d8a05901d90 in __libc_start_call_main (main=main@entry=0x604210 <main>, argc=argc@entry=29, argv=argv@entry=0x7fff6a21cb68) at ../sysdeps/nptl/libc_start_call_main.h:58
#35 0x00007d8a05901e40 in __libc_start_main_impl (main=0x604210 <main>, argc=29, argv=0x7fff6a21cb68, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff6a21cb58) at ../csu/libc-start.c:392
#36 0x0000000000604145 in _start ()