Open sterwill opened 5 months ago
I built Python 3.12.3 --with-pydebug, and now I get a crash before any corrupt output shows. Backtrace from gdb (happy to provide more if it would be helpful):
python: ./Modules/_io/textio.c:1620: _textiowrapper_writeflush: Assertion `pos == self->pending_bytes_count' failed.
[Thread 0x7ffd957ca640 (LWP 1712755) exited]
44 ./nptl/pthread_kill.c
Thread 14 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffbffff640 (LWP 1712583)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140736414611008) at ./nptl/pthread_kill.c:44
: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140736414611008) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140736414611008) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140736414611008, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007ffff7c2871b in __assert_fail_base (fmt=0x7ffff7ddd130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x555555a6f4c8 "pos == self->pending_bytes_count", file=0x555555a6fa7a "./Modules/_io/textio.c", line=1620,
function=<optimized out>) at ./assert/assert.c:92
#6 0x00007ffff7c39e96 in __GI___assert_fail (assertion=assertion@entry=0x555555a6f4c8 "pos == self->pending_bytes_count", file=file@entry=0x555555a6fa7a "./Modules/_io/textio.c", line=line@entry=1620,
function=function@entry=0x555555a6fdd0 <__PRETTY_FUNCTION__.9> "_textiowrapper_writeflush") at ./assert/assert.c:101
#7 0x00005555559639a2 in _textiowrapper_writeflush (self=self@entry=0x7ffff7402b70) at ./Modules/_io/textio.c:1620
#8 0x00005555559645da in _io_TextIOWrapper_write_impl (self=0x7ffff7402b70, text=<optimized out>) at ./Modules/_io/textio.c:1732
#9 0x0000555555964858 in _io_TextIOWrapper_write (self=<optimized out>, arg=<optimized out>) at ./Modules/_io/clinic/textio.c.h:680
#10 0x00005555557747a1 in cfunction_vectorcall_O (func=0x7ffff41dbe30, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/methodobject.c:509
#11 0x0000555555705e5d in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x7fffbfffe978, callable=0x7ffff41dbe30, tstate=0x555555f76b30) at ./Include/internal/pycore_call.h:92
#12 PyObject_CallOneArg (func=func@entry=0x7ffff41dbe30, arg=arg@entry=0x7ffff752b9a0) at Objects/call.c:401
#13 0x000055555572d77b in PyFile_WriteObject (v=<optimized out>, f=f@entry=0x7ffff7402b70, flags=flags@entry=1) at Objects/fileobject.c:132
#14 0x000055555584e5ba in builtin_print_impl (module=module@entry=0x7ffff75966f0, args=args@entry=0x7ffff6fc3e30, sep=0x0, sep@entry=0x555555bdcac0 <_Py_NoneStruct>, end=0x0, end@entry=0x555555bdcac0 <_Py_NoneStruct>, file=0x7ffff7402b70,
file@entry=0x555555bdcac0 <_Py_NoneStruct>, flush=0) at Python/bltinmodule.c:2057
#15 0x000055555584e95e in builtin_print (module=0x7ffff75966f0, args=0x7fffbfffea60, args@entry=0x7ffff6ea0328, nargs=nargs@entry=1, kwnames=kwnames@entry=0x0) at Python/clinic/bltinmodule.c.h:1121
#16 0x0000555555870a25 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555f76b30, frame=0x7ffff6ea02b0, throwflag=throwflag@entry=0) at Python/bytecodes.c:2966
#17 0x000055555587581d in _PyEval_EvalFrame (throwflag=0, frame=<optimized out>, tstate=0x555555f76b30) at ./Include/internal/pycore_ceval.h:89
#18 _PyEval_Vector (tstate=0x555555f76b30, func=0x7ffff6f33e90, locals=locals@entry=0x0, args=0x7fffbfffece8, argcount=1, kwnames=0x0) at Python/ceval.c:1683
#19 0x0000555555705497 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:419
#20 0x000055555570a17b in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7fffbfffece8, callable=0x7ffff6f33e90, tstate=0x555555f76b30) at ./Include/internal/pycore_call.h:92
#21 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:69
#22 0x000055555570811a in _PyVectorcall_Call (tstate=tstate@entry=0x555555f76b30, func=0x555555709e84 <method_vectorcall>, callable=callable@entry=0x7ffff6fa8470, tuple=tuple@entry=0x555555cd9268 <_PyRuntime+92264>, kwargs=kwargs@entry=0x0)
at Objects/call.c:271
#23 0x00005555557084e3 in _PyObject_Call (tstate=0x555555f76b30, callable=0x7ffff6fa8470, args=0x555555cd9268 <_PyRuntime+92264>, kwargs=0x0) at Objects/call.c:354
#24 0x0000555555708540 in PyObject_Call (callable=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:379
#25 0x0000555555992dcf in thread_run (boot_raw=boot_raw@entry=0x555555f23a50) at ./Modules/_threadmodule.c:1114
#26 0x0000555555904443 in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:233
#27 0x00007ffff7c94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#28 0x00007ffff7d26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
We are seeing a similar issue (python: external/python3.10/Modules/_io/textio.c:1597: _textiowrapper_writeflush: Assertion
pos == self->pending_bytes_count' failed.` in a built-in-house debug build of python).
No repro for it from our side, since it's happening intermittently in a fairly complicated Ray program, so we haven't tried to minimize the issue. Just wanted to note that users unaffiliated with @sterwill are seeing the issue as well.
https://github.com/python/cpython/blob/3.10/Modules/_io/textio.c#L1584 calls PyUnicode_DATA
and PyUnicode_GET_LENGTH
. https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_GET_LENGTH says "Return the length of the Unicode string, in code points. unicode has to be a Unicode object in the “canonical” representation (not checked)."
Any idea what the canonical representation is, and/or how to check it?
Applying the following patch to python 3.10.12
diff --git a/Modules/_io/textio.c b/Modules/_io/textio.c
index e1199a5691..65d0628c73 100644
--- a/Modules/_io/textio.c
+++ b/Modules/_io/textio.c
@@ -16,6 +16,8 @@
#include "structmember.h" // PyMemberDef
#include "_iomodule.h"
+#include <stdio.h>
+
/*[clinic input]
module _io
class _io.IncrementalNewlineDecoder "nldecoder_object *" "&PyIncrementalNewlineDecoder_Type"
@@ -1594,6 +1596,37 @@ _textiowrapper_writeflush(textio *self)
memcpy(buf + pos, src, len);
pos += len;
}
+ if (pos != self->pending_bytes_count) {
+ fprintf(stderr, "_textiowrapper_writeflush %p pos %li != self->pending_bytes_count %li\n", buf, pos, self->pending_bytes_count);
+ fprintf(stderr, "_textiowrapper_writeflush %p PyList_GET_SIZE(pending) %li\n", buf, PyList_GET_SIZE(pending));
+ fflush(stderr);
+
+ Py_ssize_t pos_ = 0;
+
+ for (Py_ssize_t i = 0; i < PyList_GET_SIZE(pending); i++) {
+ PyObject *obj = PyList_GET_ITEM(pending, i);
+ char *src;
+ Py_ssize_t len;
+ if (PyUnicode_Check(obj)) {
+ assert(PyUnicode_IS_ASCII(obj));
+ assert(PyUnicode_IS_COMPACT_ASCII(obj));
+ // src = PyUnicode_DATA(obj);
+ len = PyUnicode_GET_LENGTH(obj);
+ fprintf(stderr, "_textiowrapper_writeflush %p i %li, PyUnicode_Check, pos %li, len %li\n", buf, i, pos_, len);
+ }
+ else {
+ assert(PyBytes_Check(obj));
+ if (PyBytes_AsStringAndSize(obj, &src, &len) < 0) {
+ Py_DECREF(b);
+ return -1;
+ }
+ fprintf(stderr, "_textiowrapper_writeflush %p i %li, PyBytes_Check, pos %li, len %li\n", buf, i, pos_, len);
+ }
+ pos_ += len;
+ }
+ fprintf(stderr, "_textiowrapper_writeflush %p pos %li %li != self->pending_bytes_count %li\n", buf, pos, pos_, self->pending_bytes_count);
+ fflush(stderr);
+ }
assert(pos == self->pending_bytes_count);
}
Produces this output for a failing case:
_textiowrapper_writeflush 0x856bb50 pos 1370 != self->pending_bytes_count 2124
_textiowrapper_writeflush 0x856bb50 PyList_GET_SIZE(pending) 12
_textiowrapper_writeflush 0x856bb50 i 0, PyBytes_Check, pos 0, len 869
_textiowrapper_writeflush 0x856bb50 i 1, PyUnicode_Check, pos 869, len 1
_textiowrapper_writeflush 0x856bb50 i 2, PyUnicode_Check, pos 870, len 280
_textiowrapper_writeflush 0x856bb50 i 3, PyUnicode_Check, pos 1150, len 1
_textiowrapper_writeflush 0x856bb50 i 4, PyBytes_Check, pos 1151, len 130
_textiowrapper_writeflush 0x856bb50 i 5, PyUnicode_Check, pos 1281, len 1
_textiowrapper_writeflush 0x856bb50 i 6, PyUnicode_Check, pos 1282, len 0
_textiowrapper_writeflush 0x856bb50 i 7, PyUnicode_Check, pos 1282, len 1
_textiowrapper_writeflush 0x856bb50 i 8, PyUnicode_Check, pos 1283, len 85
_textiowrapper_writeflush 0x856bb50 i 9, PyUnicode_Check, pos 1368, len 1
_textiowrapper_writeflush 0x856bb50 i 10, PyUnicode_Check, pos 1369, len 0
_textiowrapper_writeflush 0x856bb50 i 11, PyUnicode_Check, pos 1369, len 1
_textiowrapper_writeflush 0x856bb50 pos 1370 1370 != self->pending_bytes_count 2124
python: external/python3.10/Modules/_io/textio.c:1630: _textiowrapper_writeflush: Assertion `pos == self->pending_bytes_count' failed.
*** SIGABRT received at time=1715870177 on cpu 6 ***
PC: @ 0x7f5e2dfaf7bf (unknown) pthread_kill@@GLIBC_2.34
@ 0x7f5e2df57fd0 398712080 (unknown)
@ 0x7f5e2df57f2a 32 gsignal
@ 0x7f5e2df59090 192 abort
@ 0x7f5e2df494ae 80 __assert_fail_base
@ 0x7f5e2df49523 48 __assert_fail
@ 0x5affec 144 _textiowrapper_writeflush
@ 0x5b0531 64 _io_TextIOWrapper_flush
@ 0x6a8f0c 80 method_vectorcall_NOARGS
@ 0x453541 96 PyObject_VectorcallMethod
@ 0x548088 64 flush_io
@ 0x5490a9 112 _PyRun_SimpleFileObject
@ 0x549595 64 _PyRun_AnyFileObject
@ 0x441e0b 240 Py_RunMain
@ 0x4424bb 48 Py_BytesMain
@ 0x7f5e2df3a607 160 __libc_start_call_main
@ 0x7f5e2df3a6bd 96 __libc_start_main@GLIBC_2.2.5
@ 0x4410e1 (unknown) _start
The length 0 entries seem suspicious to me.
While anecdotal, the issue seems to happen more often for me when I enable TQDM-style progress bars. I'm not sure if that's A) true, B) related to the code paths used like https://github.com/python/cpython/blob/3.10/Modules/_io/textio.c#L1663 , or C) just due to there being a lot more output when I do so.
FYI I am able to reproduce the problem reliably with the following reduced script:
import logging
from concurrent.futures import ThreadPoolExecutor
logger = logging.getLogger(__name__)
NUM_THREADS = 100
def do_one_request():
for i in range(10):
p = 'x' * 99
print(p)
with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
for i in range(NUM_THREADS):
executor.submit(do_one_request)
executor.shutdown(wait=True)
I've added slightly more debugging output to the above patch, and am now getting output like:
_textiowrapper_writeflush 0x7fc1b8000e10 i 138, PyUnicode_Check, pos 6900, len 99
_textiowrapper_writeflush 0x7fc1b8000e10 i 138, src xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
_textiowrapper_writeflush 0x7fc1b8000e10 i 139, PyUnicode_Check, pos 6999, len 1
_textiowrapper_writeflush 0x7fc1b8000e10 i 139, src ;
_textiowrapper_writeflush 0x7fc1b8000e10 i 140, PyUnicode_Check, pos 7000, len 99
_textiowrapper_writeflush 0x7fc1b8000e10 i 140, src xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
_textiowrapper_writeflush 0x7fc1b8000e10 i 141, PyUnicode_Check, pos 7099, len 1
_textiowrapper_writeflush 0x7fc1b8000e10 i 141, src ;
_textiowrapper_writeflush 0x7fc1b8000e10 pos 7100 7100 != self->pending_bytes_count 8100
python: external/python3.10/Modules/_io/textio.c:1646: _textiowrapper_writeflush: Assertion `pos == self->pending_bytes_count' failed.
142 entries / 2 writes per print * (99 'x' + 1 '\n') == 7100, so I think that the zero-length entry I mentioned above probably isn't relevant. Why it's expecting 8100 bytes and not 100k isn't clear to me.
Interestingly, it always comes up as 7100 vs. 8100. Without the barrier it varied more, but the few times I ran it it was only no error, 800 vs. 8100, or 7100 vs. 8100.
My guess at this point is that there's a threading issue with appending to and clearing the list, but I'm getting out of my depth.
I have increased confidence that it's a threading issue now. With this patch:
diff --git a/Modules/_io/textio.c b/Modules/_io/textio.c
index daa0b2e168..9bf1e96b87 100644
--- a/Modules/_io/textio.c
+++ b/Modules/_io/textio.c
@@ -16,6 +16,8 @@
#include "structmember.h" // PyMemberDef
#include "_iomodule.h"
+#include <stdio.h>
+
/*[clinic input]
module _io
class _io.IncrementalNewlineDecoder "nldecoder_object *" "&PyIncrementalNewlineDecoder_Type"
@@ -1780,6 +1782,24 @@ _io_TextIOWrapper_write_impl(textio *self, PyObject *text)
Py_DECREF(b);
return NULL;
}
+ if (self->pending_bytes_count != 0) {
+ fprintf(stderr, "_io_TextIOWrapper_write_impl self->pending_bytes_count: %li\n", self->pending_bytes_count);
+
+ if (self->pending_bytes == NULL) {
+ fprintf(stderr, "_io_TextIOWrapper_write_impl self->pending_bytes: NULL\n");
+ }
+ else if (PyBytes_Check(self->pending_bytes)) {
+ fprintf(stderr, "_io_TextIOWrapper_write_impl self->pending_bytes: bytes %li\n", PyBytes_GET_SIZE(self->pending_bytes));
+ }
+ else if (PyUnicode_Check(self->pending_bytes)) {
+ fprintf(stderr, "_io_TextIOWrapper_write_impl self->pending_bytes: unicode %li\n", PyUnicode_GET_LENGTH(self->pending_bytes));
+ }
+ else if (PyList_Check(self->pending_bytes)) {
+ fprintf(stderr, "_io_TextIOWrapper_write_impl self->pending_bytes: list %li\n", PyList_GET_SIZE(self->pending_bytes));
+ }
+ fflush(stderr);
+ }
+ assert(self->pending_bytes == NULL);
self->pending_bytes = b;
}
else if (!PyList_CheckExact(self->pending_bytes)) {
The above script produces this on stderr:
_io_TextIOWrapper_write_impl self->pending_bytes_count: 1000
_io_TextIOWrapper_write_impl self->pending_bytes: list 20
python: external/python3.10/Modules/_io/textio.c:1732: _io_TextIOWrapper_write_impl: Assertion `self->pending_bytes == NULL' failed.
Almost certainly some other thread is coming in and adding some list entries which would then get overwritten by that self->pending_bytes = b;
line.
I have also confirmed that the stdout from the script is missing some expected output (I changed the "xxxx" to be the i, j of the thread and inner loop, and some of those go missing).
This patch fixes the issue for me on python 3.10.12, but not on main
(3.14.0a0).
diff --git a/Modules/_io/textio.c b/Modules/_io/textio.c
index e1199a5691..1d1f2d050d 100644
--- a/Modules/_io/textio.c
+++ b/Modules/_io/textio.c
@@ -1551,31 +1551,21 @@ _textiowrapper_writeflush(textio *self)
PyObject *pending = self->pending_bytes;
PyObject *b;
+ assert(PyList_Check(pending));
- if (PyBytes_Check(pending)) {
- b = pending;
- Py_INCREF(b);
- }
- else if (PyUnicode_Check(pending)) {
- assert(PyUnicode_IS_ASCII(pending));
- assert(PyUnicode_GET_LENGTH(pending) == self->pending_bytes_count);
- b = PyBytes_FromStringAndSize(
- PyUnicode_DATA(pending), PyUnicode_GET_LENGTH(pending));
- if (b == NULL) {
- return -1;
- }
+ // We get pending size first
+ Py_ssize_t pending_size_at_start = PyList_GET_SIZE(pending);
+ // now we use the value in _count so that b is always >= size of bytes in pending[0:size]
+ b = PyBytes_FromStringAndSize(NULL, self->pending_bytes_count);
+ if (b == NULL) {
+ return -1;
}
- else {
- assert(PyList_Check(pending));
- b = PyBytes_FromStringAndSize(NULL, self->pending_bytes_count);
- if (b == NULL) {
- return -1;
- }
+ if (pending_size_at_start > 0) {
char *buf = PyBytes_AsString(b);
Py_ssize_t pos = 0;
- for (Py_ssize_t i = 0; i < PyList_GET_SIZE(pending); i++) {
+ for (Py_ssize_t i = 0; i < pending_size_at_start; i++) {
PyObject *obj = PyList_GET_ITEM(pending, i);
char *src;
Py_ssize_t len;
@@ -1594,13 +1584,15 @@ _textiowrapper_writeflush(textio *self)
memcpy(buf + pos, src, len);
pos += len;
}
- assert(pos == self->pending_bytes_count);
+ // decrement after list delete so that _count is always >= the size of bytes in pending
+ if (PyList_SetSlice(pending, 0, pending_size_at_start, NULL) < 0) {
+ Py_DECREF(b);
+ return -1;
+ }
+ self->pending_bytes_count -= pos; // this needs to be atomic, and I don't think it is by default
+ assert(self->pending_bytes_count >= 0);
}
- self->pending_bytes_count = 0;
- self->pending_bytes = NULL;
- Py_DECREF(pending);
-
PyObject *ret;
do {
ret = PyObject_CallMethodOneArg(self->buffer, _PyIO_str_write, b);
@@ -1702,35 +1694,28 @@ _io_TextIOWrapper_write_impl(textio *self, PyObject *text)
if (self->pending_bytes == NULL) {
self->pending_bytes_count = 0;
- self->pending_bytes = b;
- }
- else if (self->pending_bytes_count + bytes_len > self->chunk_size) {
- // Prevent to concatenate more than chunk_size data.
- if (_textiowrapper_writeflush(self) < 0) {
+ self->pending_bytes = PyList_New(0);
+ if (self->pending_bytes == NULL) {
Py_DECREF(b);
return NULL;
}
- self->pending_bytes = b;
}
- else if (!PyList_CheckExact(self->pending_bytes)) {
- PyObject *list = PyList_New(2);
- if (list == NULL) {
+ else if (self->pending_bytes_count + bytes_len > self->chunk_size) {
+ // Prevent to concatenate more than chunk_size data.
+ if (_textiowrapper_writeflush(self) < 0) {
Py_DECREF(b);
return NULL;
}
- PyList_SET_ITEM(list, 0, self->pending_bytes);
- PyList_SET_ITEM(list, 1, b);
- self->pending_bytes = list;
}
- else {
- if (PyList_Append(self->pending_bytes, b) < 0) {
- Py_DECREF(b);
- return NULL;
- }
+
+ // increment before append so that _count is always >= the size of bytes in pending
+ self->pending_bytes_count += bytes_len; // this needs to be atomic, and I don't think it is by default
+ if (PyList_Append(self->pending_bytes, b) < 0) {
Py_DECREF(b);
+ return NULL;
}
+ Py_DECREF(b);
- self->pending_bytes_count += bytes_len;
if (self->pending_bytes_count >= self->chunk_size || needflush ||
text_needflush) {
if (_textiowrapper_writeflush(self) < 0)
This is probably an awful way to fix the issue.
Bug report
Bug description:
I've been trying to track down a rare segfault I'm seeing with Python 3 (several versions) in AWS Lambda and AWS CodeBuild. It's been very hard to reproduce, but I have a few stack traces that show the crash happens in OpenSSL's certificate verification code below Python's
ssl
module. While trying to isolate the problem in OpenSSL, I stumbled on a different issue that might be related, which I'm reporting here.The program below uses the standard library to execute a lot of HTTPS requests concurrently with threads. When run on my Linux desktop in gnome-terminal, the program doesn't exhibit any weird behavior. It prints a bunch of
X
characters for each request it completes. However, if I pipe its output through a program likeless
ortee
, I see additional output--raw bytes from the SSL network session bleeding into stdout. It feels like OpenSSL is writing into memory that Python's using to prepare strings or print to stdout due to a lack of synchronization or refcounting, but I don't know OpenSSL or Python internals well, so this is just a guess.python3 test-program.py | tee /dev/null
makes it happen every time on my workstation. It never crashes, it just bleeds SSL contents into the output.It happens on Ubuntu 22.04 x86_64 with Python 3.12 and OpenSSL 3.0.2, and on AWS CloudShell with Python 3.9.16 and OpenSSL 3.0.8.
Feel free to run this program as often as you want with the URL I put in there (it's a small static web page on my personal site that won't mind a few thousand hits).
CPython versions tested on:
3.9, 3.12
Operating systems tested on:
Linux
Linked PRs