Closed sobolevn closed 11 months ago
With more info:
test_detect_api_mismatch (test.test_support.TestSupport.test_detect_api_mismatch) ... ok
test_detect_api_mismatch__ignore (test.test_support.TestSupport.test_detect_api_mismatch__ignore) ... ok
test_fd_count (test.test_support.TestSupport.test_fd_count) ... [1] 16728 killed PYTHONFAULTHANDLER=1 ./python.exe -m test -v
However, this is dependent on some other test, for now I don't know which one.
I don't know which one.
Try to run:
./python.exe -m test.bisect_cmd -o bisect test_support
Then look into bisect file result.
I wrote this simple script (because bisect
's output didn't reproduce this problem):
And tests still pass. So, I think that we need several tests, not just two.
@shamsherfairfest this is unrelated, please open a new issue with the minimal reproduction. We would be happy to help you in that case :)
@shamsherfairfest this is unrelated, please open a new issue with the minimal reproduction. We would be happy to help you in that case :)
ok sure thanks
I wrote this simple script (because bisect's output didn't reproduce this problem)
Did you run test.bisect_cmd
on test_support, or on the whole Python test suite? What do you mean that it doesn't reproduce this problem? Which behavior do you get when you run ./python.exe -m test test_support --matchfile=bisect
(or ./python.exe -m test --matchfile=bisect
if you ran it in the whole test suite)?
I've reproduced it with only three test files: ./python.exe -m test test_sax test_socket test_support
Now, I need to find exact tests that are related.
It might be related to https://github.com/python/cpython/pull/110013 since it is also about test_socket
I've reproduced it with only three test files: ./python.exe -m test test_sax test_socket test_support
Wow, that's very helpful, thanks!
Now, I need to find exact tests that are related.
You may give a new try to bisect_cmd:
./python.exe -m test.bisect_cmd -o bisect test_sax test_socket test_support
And then:
./python.exe -m test --matchfile=bisect test_sax test_socket test_support
It should help to reduce the quantity of code that you have to review, by skipping many tests, and help you to find the minimum set of tests which trigger the bug.
By the way, it would be interesting to have a "files" mode in bisect_cmd to filter test files, rather than test cases.
I found all three offending tests 🎉
test.test_sax.ExpatReaderTest.test_expat_external_dtd_enabled
test.test_socket.GeneralModuleTests.testHostnameRes
test.test_support.TestSupport.test_fd_count
Reproducer:
from xml.sax.xmlreader import InputSource
from xml.sax.expatreader import create_parser
from xml.sax.handler import feature_external_ges
class TestEntityRecorder:
def __init__(self):
self.entities = []
def resolveEntity(self, publicId, systemId):
self.entities.append((publicId, systemId))
source = InputSource()
source.setPublicId(publicId)
source.setSystemId(systemId)
return source
parser = create_parser()
parser.setFeature(feature_external_ges, True)
resolver = TestEntityRecorder()
parser.setEntityResolver(resolver)
try:
parser.feed(
'<!DOCTYPE external SYSTEM "unsupported://non-existing">\n'
)
except Exception:
pass
import socket
hostname = socket.gethostname()
ip = socket.gethostbyname(hostname)
from test.support import os_helper
os_helper.fd_count()
from xml.sax.xmlreader import InputSource
from xml.sax.expatreader import create_parser
from xml.sax.handler import feature_external_ges
I'm now curious about the interaction between SAX parser, gethostname(), gethostbyname() and fd_count().
IMO it's an OS bug, you might need attempt to go further and have like no "import", and even write a C reproducer. Tell me if you need help for that, so you can report it to Apple. We have a few core devs who are in touch with Apple.
cc @ned-deily @ronaldoussoren
Is there a macOS crash report with stack trace? Check Console.app
or look in ~/Library/Logs/DiagnosticReports
.
Minimizing reproducer even more:
from xml.sax.expatreader import create_parser
from xml.sax.handler import feature_external_ges
parser = create_parser()
parser.setFeature(feature_external_ges, True)
try:
parser.feed(
'<!DOCTYPE external SYSTEM "unsupported://non-existing">\n'
)
except Exception:
pass
import socket
hostname = socket.gethostname()
ip = socket.gethostbyname(hostname)
import os
for fd in range(os.sysconf("SC_OPEN_MAX")):
fd2 = os.dup(fd)
# or
# import fcntl
# for fd in range(os.sysconf("SC_OPEN_MAX")):
# fcntl.fcntl(fd, fcntl.F_DUPFD_CLOEXEC, 0)
It crashes on this line: https://github.com/python/cpython/blob/b987fdb19b981ef6e7f71b41790b5ed4e2064646/Python/fileutils.c#L2594
@ned-deily I don't see a stacktrace :(
So, the crash goes away if I just change https://github.com/python/cpython/blob/b987fdb19b981ef6e7f71b41790b5ed4e2064646/Lib/xml/sax/saxutils.py#L365 to be just f = b''
. If I do that it switches to OSError: [Errno 24] Too many open files
Not sure what role socket
call plays in all of this.
You may not get a crash dump file if the executable is not running as an app bundle. Using the python.org 3.12.0 for macOS to run the minimal reproducer above, I get a crash file with the following relevant info:
...
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_GUARD (SIGKILL)
Exception Codes: GUARD_TYPE_FD
Exception Codes: 0x0000000000000003, 0xc7a50e7608548068
Termination Reason: Namespace GUARD, Code 4611686027017322499
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x1828cf41c __fcntl + 8
1 libsystem_kernel.dylib 0x1828cf3e8 fcntl + 88
2 Python 0x105c1bbb4 _Py_dup + 120
3 Python 0x105c2c65c os_dup + 28
4 Python 0x105b7f074 _PyEval_EvalFrameDefault + 50916
...
According to some Apple developer documentation:
EXC_GUARD indicates the process violated a guarded resource protection. Although there are multiple types of guarded system resources, most guarded resource crashes are from guarded file descriptors, which have the GUARD_TYPE_FD value in the Exception Subtype field. The operating system marks a file descriptor as guarded so that normal file descriptor APIs can’t modify them. [...] DUP. The process attempted to invoke dup(), dup2(), or fcntl() with the F_DUPFD or F_DUPFD_CLOEXEC commands on a guarded file descriptor.
So it would seem this is a test case error with consequnces on macOS. Perhaps the first part of the test (using xml.sax
) is leaving file descriptors open?
If I run @sobolevn's script above I also get a crash, and get the following information on open files using lsof(1):
python.ex 33790 ronald 0r CHR 16,3 0t451 1229 /dev/ttys003
python.ex 33790 ronald 1w CHR 16,3 0t535 1229 /dev/ttys003
python.ex 33790 ronald 2w CHR 16,3 0t187 1229 /dev/ttys003
python.ex 33790 ronald 3 NPOLICY
python.ex 33790 ronald 4u systm 0x375ff3a427341a21 0t0 [ctl com.apple.netsrc id 7 unit 44]
python.ex 33790 ronald 5u unix 0x375ff3a8f44362c1 0t0 ->0x375ff3a8f4436771
python.ex 33790 ronald 6r CHR 16,3 0t451 1229 /dev/ttys003
python.ex 33790 ronald 7w CHR 16,3 0t535 1229 /dev/ttys003
python.ex 33790 ronald 8w CHR 16,3 0t187 1229 /dev/ttys003
The crash happens due to calling dup on file descriptor 3:
(lldb) thread backtrace
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_GUARD (code=4611686027017322499, subcode=0x4f84efa691046cd)
* frame #0: 0x0000000189b3541c libsystem_kernel.dylib`__fcntl + 8
frame #1: 0x0000000189b353e8 libsystem_kernel.dylib`fcntl + 88
frame #2: 0x0000000100270228 python.exe`_Py_dup(fd=3) at fileutils.c:2599:10 [opt]
frame #3: 0x0000000100280d60 python.exe`os_dup [inlined] os_dup_impl(module=<unavailable>, fd=3) at posixmodule.c:10590:12 [opt]
frame #4: 0x0000000100280d58 python.exe`os_dup(module=<unavailable>, arg=<unavailable>) at posixmodule.c.h:6758:21 [opt]
BTW. The earlier script using os_helper.fd_count
doesn't crash for me.
Running the problematic test cases gives the same crash:
(lldb) r -m test test_sax test_socket test_support
Process 33933 launched: '/Users/ronald/Projects/Forks/cpython-upstream/build/python.exe' (arm64)
Using random seed: 1511475695
Raised RLIMIT_NOFILE: 256 -> 1024
0:00:00 load avg: 3.26 Run 3 tests sequentially
0:00:00 load avg: 3.26 [1/3] test_sax
0:00:00 load avg: 3.24 [2/3] test_socket
2023-12-06 11:17:05.606880+0100 python.exe[33933:11421359] [si_destination_compare] send failed: Invalid argument
2023-12-06 11:17:05.606894+0100 python.exe[33933:11421359] [si_destination_compare] send failed: Undefined error: 0
2023-12-06 11:17:05.606897+0100 python.exe[33933:11421359] [si_destination_compare] send failed: Invalid argument
/Users/ronald/Projects/Forks/cpython-upstream/Lib/test/test_socket.py:2934: RuntimeWarning: received malformed or improperly-truncated ancillary data
result = sock.recvmsg_into([buf], *args)
/Users/ronald/Projects/Forks/cpython-upstream/Lib/test/test_socket.py:2843: RuntimeWarning: received malformed or improperly-truncated ancillary data
result = sock.recvmsg(bufsize, *args)
0:00:29 load avg: 3.15 [3/3] test_support
Process 33933 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_GUARD (code=4611686027017322499, subcode=0x55c498140be65944)
frame #0: 0x0000000189b3541c libsystem_kernel.dylib`__fcntl + 8
libsystem_kernel.dylib`:
-> 0x189b3541c <+8>: b.lo 0x189b3543c ; <+40>
0x189b35420 <+12>: pacibsp
0x189b35424 <+16>: stp x29, x30, [sp, #-0x10]!
0x189b35428 <+20>: mov x29, sp
Target 0: (python.exe) stopped.
(lldb) thread backtrace -c 10
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_GUARD (code=4611686027017322499, subcode=0x55c498140be65944)
* frame #0: 0x0000000189b3541c libsystem_kernel.dylib`__fcntl + 8
frame #1: 0x0000000189b353e8 libsystem_kernel.dylib`fcntl + 88
frame #2: 0x0000000100270228 python.exe`_Py_dup(fd=3) at fileutils.c:2599:10 [opt]
frame #3: 0x0000000100280d60 python.exe`os_dup [inlined] os_dup_impl(module=<unavailable>, fd=3) at posixmodule.c:10590:12 [opt]
frame #4: 0x0000000100280d58 python.exe`os_dup(module=<unavailable>, arg=<unavailable>) at posixmodule.c.h:6758:21 [opt]
frame #5: 0x00000001001bac48 python.exe`_PyEval_EvalFrameDefault(tstate=<unavailable>, frame=0x0000000100985130, throwflag=0) at generated_cases.c.h:1134:19 [opt]
frame #6: 0x0000000100095f24 python.exe`_PyObject_VectorcallTstate(tstate=0x00000001005ce780, callable=0x0000000100fbdf10, args=0x000000016fdfd380, nargsf=2, kwnames=0x0000000000000000) at pycore_call.h:168:11 [opt]
frame #7: 0x0000000100095144 python.exe`method_vectorcall(method=<unavailable>, args=0x0000000101b2d0f8, nargsf=<unavailable>, kwnames=0x0000000000000000) at classobject.c:92:18 [opt]
From the limited research I've done it is possible to detect if a file descriptor is guarded using the undocumented proc_pidfdinfo
, although I haven't tried to use this yet.
That said, the easier fix on (recent?) versions of macOS is to use /dev/fd
to check for open files in test.support.os_helper.fd_count
, just as /proc/self/fd
is used on Linux and FreeBSD.
With the patch below ./python.exe -m test test_sax test_socket test_support -v
no longer crashes. Working on a PR.
diff --git a/Lib/test/support/os_helper.py b/Lib/test/support/os_helper.py
index 46ae53aa11..ca0b321f9e 100644
--- a/Lib/test/support/os_helper.py
+++ b/Lib/test/support/os_helper.py
@@ -600,6 +600,15 @@ def fd_count():
except FileNotFoundError:
pass
+ if sys.platform == "darwin":
+ try:
+ names = os.listdir("/dev/fd")
+ # Subtract one because listdir() internally opens a file
+ # descriptor to list the content of the /proc/self/fd/ directory.
+ return len(names) - 1
+ except FileNotFoundError:
+ pass
+
MAXFD = 256
if hasattr(os, 'sysconf'):
try:
Crash report
I've experienced a crash on
main
branch. I just used./python.exe -m test
This is the all output I have for now. Cannot reproduce with smaller set of tests.
Linked PRs