python / cpython

The Python programming language
https://www.python.org
Other
63.68k stars 30.51k forks source link

Segfault/aborts calling `readline.set_completer_delims` in threads in a free-threaded build #126895

Open devdanzin opened 1 week ago

devdanzin commented 1 week ago

Crash report

What happened?

Calling difflib._test in threads in a free-threaded build (with PYTHON_GIL=0) will result in aborts or segfaults, apparently related to memory issues:

from threading import Thread
import difflib

for x in range(100):
    Thread(target=difflib._test, args=()).start()

Segfault backtrace:

Thread 92 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffeca7e4640 (LWP 631239)]
0x00007ffff7d3f743 in unlink_chunk (p=p@entry=0x555555f56c00, av=0x7ffff7eb8c80 <main_arena>) at ./malloc/malloc.c:1634
1634    ./malloc/malloc.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7d3f743 in unlink_chunk (p=p@entry=0x555555f56c00, av=0x7ffff7eb8c80 <main_arena>)
    at ./malloc/malloc.c:1634
#1  0x00007ffff7d40d2b in _int_free (av=0x7ffff7eb8c80 <main_arena>, p=0x555555f528a0,
    have_lock=<optimized out>) at ./malloc/malloc.c:4616
#2  0x00007ffff7d43453 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#3  0x000055555570ed2c in _PyMem_RawFree (_unused_ctx=<optimized out>, ptr=<optimized out>)
    at Objects/obmalloc.c:90
#4  0x0000555555713955 in _PyMem_DebugRawFree (ctx=0x555555cdb8f8 <_PyRuntime+824>, p=0x555555f528c0)
    at Objects/obmalloc.c:2767
#5  0x000055555572c462 in PyMem_RawFree (ptr=ptr@entry=0x555555f528c0) at Objects/obmalloc.c:971
#6  0x00005555559527fc in free_threadstate (tstate=tstate@entry=0x555555f528c0) at Python/pystate.c:1411
#7  0x00005555559549c0 in _PyThreadState_DeleteCurrent (tstate=tstate@entry=0x555555f528c0)
    at Python/pystate.c:1834
#8  0x0000555555a11b74 in thread_run (boot_raw=boot_raw@entry=0x555555f52870)
    at ./Modules/_threadmodule.c:355
#9  0x0000555555974fdb in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:242
#10 0x00007ffff7d32ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x00007ffff7dc4850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Abort 1 backtrace:

Thread 78 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffed17f2640 (LWP 633651)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140732413191744) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140732413191744) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140732413191744) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140732413191744, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7ce0476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7cc67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7d27676 in __libc_message (action=action@entry=do_abort,
    fmt=fmt@entry=0x7ffff7e79b77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7d3ecfc in malloc_printerr (
    str=str@entry=0x7ffff7e7c6c0 "free(): unaligned chunk detected in tcache 2") at ./malloc/malloc.c:5664
#7  0x00007ffff7d40e3f in _int_free (av=0x7ffff7eb8c80 <main_arena>, p=0x555555dc79b0, have_lock=0)
    at ./malloc/malloc.c:4471
#8  0x00007ffff7d43453 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x00007ffff41d4a2b in readline_set_completer_delims (module=<optimized out>, string=<optimized out>)
    at ./Modules/readline.c:593
#10 0x00005555557021ef in cfunction_vectorcall_O (
    func=<built-in method set_completer_delims of module object at remote 0x2000297b920>,
    args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/methodobject.c:523
#11 0x000055555567cc55 in _PyObject_VectorcallTstate (tstate=0x555555f15bc0,
    callable=<built-in method set_completer_delims of module object at remote 0x2000297b920>,
    args=0x7ffed17f15a8, nargsf=9223372036854775809, kwnames=0x0) at ./Include/internal/pycore_call.h:167
#12 0x000055555567cd74 in PyObject_Vectorcall (
    callable=callable@entry=<built-in method set_completer_delims of module object at remote 0x2000297b920>,
 args=args@entry=0x7ffed17f15a8, nargsf=<optimized out>, kwnames=kwnames@entry=0x0) at Objects/call.c:327
#13 0x00005555558441f7 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555f15bc0,
    frame=<optimized out>, throwflag=throwflag@entry=0) at Python/generated_cases.c.h:955
#14 0x0000555555872978 in _PyEval_EvalFrame (throwflag=0, frame=<optimized out>, tstate=0x555555f15bc0)
    at ./Include/internal/pycore_ceval.h:116

Abort 2 backtrace:

Thread 43 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff54ff9640 (LWP 636245)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140734619424320) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140734619424320) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140734619424320) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140734619424320, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7ce0476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7cc67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7d27676 in __libc_message (action=action@entry=do_abort,
    fmt=fmt@entry=0x7ffff7e79b77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7d3ecfc in malloc_printerr (
    str=str@entry=0x7ffff7e7cf20 "tcache_thread_shutdown(): unaligned tcache chunk detected")
    at ./malloc/malloc.c:5664
#7  0x00007ffff7d436c4 in tcache_thread_shutdown () at ./malloc/malloc.c:3224
#8  __malloc_arena_thread_freeres () at ./malloc/arena.c:1003
#9  0x00007ffff7d461ca in __libc_thread_freeres () at ./malloc/thread-freeres.c:44
#10 0x00007ffff7d3294f in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:456
#11 0x00007ffff7dc4850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Found using fusil by @vstinner.

CPython versions tested on:

3.13, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.14.0a1+ experimental free-threading build (heads/main-dirty:612ac283b81, Nov 16 2024, 01:37:56) [GCC 11.4.0]

picnixz commented 1 week ago

The _test function is only doing:

def _test():
    import doctest, difflib
    return doctest.testmod(difflib)

I think the issue is either with doctest or the imports, not with difflib itself.

cc @ZeroIntensity

devdanzin commented 1 week ago

I believe this is due to doctest using pdb, which seems to cause all sorts of segfaults and aborts when called from threads. Comment the marked lines out for a different crash.

import pdb
import sys
from threading import Thread

stdin = open("/dev/null")
sys.stdin = stdin

def f():
    for x in range(100):
        p = pdb.Pdb()
        p.reset()
        try:               # Comment
            p.set_trace()  # these
        except:            # lines
            pass           # out.
        p.reset()
        p.set_continue()

for x in range(100):
    Thread(target=f, args=()).start()

So maybe the issue is just that pdb isn't free-threading ready?

Edit: here's a collection of errors this causes:

    2908176786326028288 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
        at p-7: 0x5b *** OUCH
        at p-6: 0xf0 *** OUCH
        at p-5: 0xf8 *** OUCH
        at p-4: 0xfe *** OUCH
        at p-3: 0x53 *** OUCH
        at p-2: 0x1e *** OUCH
        at p-1: 0x99 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
mimalloc: error: thread 0x7fa0154d4640: buffer overflow in heap block 0x2000a010f30 of size 40: write after 40 bytes
    The 8 pad bytes at tail=0x285c417fbab28370 are Aborted
free(): unaligned chunk detected in tcache 2
Aborted
tcache_thread_shutdown(): unaligned tcache chunk detected
Aborted
free(): invalid pointer
Aborted
Segmentation fault
double free or corruption (fasttop)
Aborted
Debug memory block at address p=0x5572ce2f7580: API '�'
    14 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
        at p-7: 0xdb *** OUCH
        at p-6: 0x4c *** OUCH
        at p-5: 0x84 *** OUCH
        at p-4: 0x07 *** OUCH
        at p-2: 0xc2 *** OUCH
        at p-1: 0x3a *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0x5572ce2f758e are not all FORBIDDENBYTE (0xfd):
        at tail+0: 0xdd *** OUCH
        at tail+1: 0xdd *** OUCH
        at tail+2: 0xdd *** OUCH
        at tail+3: 0xdd *** OUCH
        at tail+4: 0xdd *** OUCH
        at tail+5: 0xdd *** OUCH
        at tail+6: 0xdd *** OUCH
        at tail+7: 0xdd *** OUCH
    Data at p: dd dd dd dd dd dd dd dd dd dd dd dd dd dd

Enable tracemalloc to get the memory block allocation traceback

Fatal Python error: _PyMem_DebugRawFree: bad ID: Allocated using API '�', verified using API 'r'
Python runtime state: initialized
Aborted
picnixz commented 1 week ago

@gaogaotiantian Is there a plan to make pdb free-threaded friendly or not? if so, how can we help, if not what should we do for this issue?

ZeroIntensity commented 1 week ago

I'll wait for Tian's input, but I'm happy to spend a few hours covering pdb in locks if we need to.

gaogaotiantian commented 1 week ago

pdb is a pure Python module, without any black magic (ctypes, changing the code object, etc.). It should not be possible for it to crash (segfault) CPython, free-threaded build or not. If pdb crashes free-threaded build, that means free-threaded build has a bug, not pdb.

From the code above, the conclusion I can get is pdb triggered a crash, and we don't know why. Protecting pdb with lock might hide the crash, but it does not solve the problem. We should not be able to crash CPython with pure Python code.

Also, pdb does not support multi-threading debugging at this point. It's really slow to move it forward for any relatively new features. That being said, I don't think it's a good time to try to protect it with locks because we need to deal with multithreading in the future.

ZeroIntensity commented 1 week ago

Does it use some weird frames APIs (e.g. sys._getframe) by any chance? Those aren't thread safe IIRC.

devdanzin commented 1 week ago

The culprit seems to be readline, which pdb uses:

from threading import Thread
import readline

def f():
    for x in range(100):
        readline.get_completer_delims()
        readline.set_completer_delims(' \t\n`@#%^&*()=+[{]}\\|;:\'",<>?')
        readline.set_completer_delims(' \t\n`@#%^&*()=+[{]}\\|;:\'",<>?')
        readline.get_completer_delims()

for x in range(100):
    Thread(target=f, args=()).start()

Seems to fail with the same errors as the original code.

ZeroIntensity commented 1 week ago

It's probably the system that's not thread safe in that case.

devdanzin commented 1 week ago

The error appears to come from CPython code, line 593 seems to be affecting module global data, right?

https://github.com/python/cpython/blob/2313f8421080ceb3343c6f5d291279adea85e073/Modules/readline.c#L578-L594

gaogaotiantian commented 1 week ago

I won't be surprised if readline is not thread safe. Actually at this point, I won't be surprised if anything crashes free-threaded CPython. It's just not ready yet. That's not saying we should ignore this, but I think we have an issue (https://github.com/python/cpython/issues/116738) to track the thread-safety for all internal C modules, and it seems like readline is under the category that's not inspected. This is just some work we need to do in the future, definitely not a bug in pdb.

ZeroIntensity commented 1 week ago

That issue looks a little out of date, but yeah, we're still working on the thread safety. Thanks for your input, Tian!

@devdanzin, would you like to author a PR for fixing the global state in readline? (I'm assuming we need to either make it thread-local, move it to the module state, or just put an ugly lock around it.) If not, I can do it.

devdanzin commented 1 week ago

I wouldn't know how, please take it if you can :)

ZeroIntensity commented 1 week ago

Ok, I'm busy with a few other issues right now, so I'll leave this open to someone else for a bit. If nobody decides they want to do it, I'll get to it :)

gaogaotiantian commented 1 week ago

I would be a little bit more careful when dealing with readline, because it relies on some global variables of the readline library, so we can't simply put everything thread or module local. The variables starting with rl_ are the extern global variables exposed by readline itself. Also, we support libedit which has a slightly different interface. The module is not the most stable one in our stdlib so tread carefully :)

ZeroIntensity commented 1 week ago

If it has global state, I doubt the functions themself are thread safe. In that case, we probably should just mark it as needing the GIL.

vstinner commented 1 week ago

In that case, we probably should just mark it as needing the GIL.

Or add @critical_section or another lock on functions which are known to not be thread-safe.

ZeroIntensity commented 1 week ago

That wouldn't work for subinterpreters, would it?