Open devdanzin opened 1 week ago
The _test
function is only doing:
def _test():
import doctest, difflib
return doctest.testmod(difflib)
I think the issue is either with doctest or the imports, not with difflib itself.
cc @ZeroIntensity
I believe this is due to doctest
using pdb
, which seems to cause all sorts of segfaults and aborts when called from threads. Comment the marked lines out for a different crash.
import pdb
import sys
from threading import Thread
stdin = open("/dev/null")
sys.stdin = stdin
def f():
for x in range(100):
p = pdb.Pdb()
p.reset()
try: # Comment
p.set_trace() # these
except: # lines
pass # out.
p.reset()
p.set_continue()
for x in range(100):
Thread(target=f, args=()).start()
So maybe the issue is just that pdb
isn't free-threading ready?
Edit: here's a collection of errors this causes:
2908176786326028288 bytes originally requested
The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
at p-7: 0x5b *** OUCH
at p-6: 0xf0 *** OUCH
at p-5: 0xf8 *** OUCH
at p-4: 0xfe *** OUCH
at p-3: 0x53 *** OUCH
at p-2: 0x1e *** OUCH
at p-1: 0x99 *** OUCH
Because memory is corrupted at the start, the count of bytes requested
may be bogus, and checking the trailing pad bytes may segfault.
mimalloc: error: thread 0x7fa0154d4640: buffer overflow in heap block 0x2000a010f30 of size 40: write after 40 bytes
The 8 pad bytes at tail=0x285c417fbab28370 are Aborted
free(): unaligned chunk detected in tcache 2
Aborted
tcache_thread_shutdown(): unaligned tcache chunk detected
Aborted
free(): invalid pointer
Aborted
Segmentation fault
double free or corruption (fasttop)
Aborted
Debug memory block at address p=0x5572ce2f7580: API '�'
14 bytes originally requested
The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
at p-7: 0xdb *** OUCH
at p-6: 0x4c *** OUCH
at p-5: 0x84 *** OUCH
at p-4: 0x07 *** OUCH
at p-2: 0xc2 *** OUCH
at p-1: 0x3a *** OUCH
Because memory is corrupted at the start, the count of bytes requested
may be bogus, and checking the trailing pad bytes may segfault.
The 8 pad bytes at tail=0x5572ce2f758e are not all FORBIDDENBYTE (0xfd):
at tail+0: 0xdd *** OUCH
at tail+1: 0xdd *** OUCH
at tail+2: 0xdd *** OUCH
at tail+3: 0xdd *** OUCH
at tail+4: 0xdd *** OUCH
at tail+5: 0xdd *** OUCH
at tail+6: 0xdd *** OUCH
at tail+7: 0xdd *** OUCH
Data at p: dd dd dd dd dd dd dd dd dd dd dd dd dd dd
Enable tracemalloc to get the memory block allocation traceback
Fatal Python error: _PyMem_DebugRawFree: bad ID: Allocated using API '�', verified using API 'r'
Python runtime state: initialized
Aborted
@gaogaotiantian Is there a plan to make pdb free-threaded friendly or not? if so, how can we help, if not what should we do for this issue?
I'll wait for Tian's input, but I'm happy to spend a few hours covering pdb in locks if we need to.
pdb
is a pure Python module, without any black magic (ctypes, changing the code object, etc.). It should not be possible for it to crash (segfault) CPython, free-threaded build or not. If pdb crashes free-threaded build, that means free-threaded build has a bug, not pdb
.
From the code above, the conclusion I can get is pdb
triggered a crash, and we don't know why. Protecting pdb
with lock might hide the crash, but it does not solve the problem. We should not be able to crash CPython with pure Python code.
Also, pdb
does not support multi-threading debugging at this point. It's really slow to move it forward for any relatively new features. That being said, I don't think it's a good time to try to protect it with locks because we need to deal with multithreading in the future.
Does it use some weird frames APIs (e.g. sys._getframe
) by any chance? Those aren't thread safe IIRC.
The culprit seems to be readline
, which pdb
uses:
from threading import Thread
import readline
def f():
for x in range(100):
readline.get_completer_delims()
readline.set_completer_delims(' \t\n`@#%^&*()=+[{]}\\|;:\'",<>?')
readline.set_completer_delims(' \t\n`@#%^&*()=+[{]}\\|;:\'",<>?')
readline.get_completer_delims()
for x in range(100):
Thread(target=f, args=()).start()
Seems to fail with the same errors as the original code.
It's probably the system that's not thread safe in that case.
The error appears to come from CPython code, line 593 seems to be affecting module global data, right?
I won't be surprised if readline
is not thread safe. Actually at this point, I won't be surprised if anything crashes free-threaded CPython. It's just not ready yet. That's not saying we should ignore this, but I think we have an issue (https://github.com/python/cpython/issues/116738) to track the thread-safety for all internal C modules, and it seems like readline
is under the category that's not inspected. This is just some work we need to do in the future, definitely not a bug in pdb
.
That issue looks a little out of date, but yeah, we're still working on the thread safety. Thanks for your input, Tian!
@devdanzin, would you like to author a PR for fixing the global state in readline
? (I'm assuming we need to either make it thread-local, move it to the module state, or just put an ugly lock around it.) If not, I can do it.
I wouldn't know how, please take it if you can :)
Ok, I'm busy with a few other issues right now, so I'll leave this open to someone else for a bit. If nobody decides they want to do it, I'll get to it :)
I would be a little bit more careful when dealing with readline
, because it relies on some global variables of the readline library, so we can't simply put everything thread or module local. The variables starting with rl_
are the extern global variables exposed by readline itself. Also, we support libedit which has a slightly different interface. The module is not the most stable one in our stdlib so tread carefully :)
If it has global state, I doubt the functions themself are thread safe. In that case, we probably should just mark it as needing the GIL.
In that case, we probably should just mark it as needing the GIL.
Or add @critical_section
or another lock on functions which are known to not be thread-safe.
That wouldn't work for subinterpreters, would it?
Crash report
What happened?
Calling
difflib._test
in threads in a free-threaded build (withPYTHON_GIL=0
) will result in aborts or segfaults, apparently related to memory issues:Segfault backtrace:
Abort 1 backtrace:
Abort 2 backtrace:
Found using fusil by @vstinner.
CPython versions tested on:
3.13, CPython main branch
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
Python 3.14.0a1+ experimental free-threading build (heads/main-dirty:612ac283b81, Nov 16 2024, 01:37:56) [GCC 11.4.0]