Open PeterLuschny opened 10 months ago
also tested on Linux (with the call to sys.getwindowsversion()
removed).
As well, there is no discrepancy with 3.11 vs 3.12 if @cache
is removed. So the working hypothesis is that @cache
is to blame for the regression.
The current master is bad too. Bisecting currently, last year things were still OK.
In my testing I modified the lines
sys.set_int_max_str_digits(5000)
print(sys.version_info, sys.getwindowsversion())
I removed the 1st one (it's very recent), and only left
print(sys.version_info)
I found the 1st bad commit, 76449350b3467b85bcb565f9e2bf945bd150a66e, by bisecting:
76449350b3467b85bcb565f9e2bf945bd150a66e is the first bad commit
commit 76449350b3467b85bcb565f9e2bf945bd150a66e
Author: Mark Shannon <mark@hotpy.org>
Date: Wed Oct 5 01:34:03 2022 +0100
GH-91079: Decouple C stack overflow checks from Python recursion checks. (GH-96510)
Include/cpython/pystate.h | 16 ++++-
Include/internal/pycore_ceval.h | 21 +++---
Include/internal/pycore_runtime_init.h | 2 +-
Lib/test/support/__init__.py | 5 +-
Lib/test/test_ast.py | 6 +-
Lib/test/test_call.py | 38 ++++++++++
Lib/test/test_collections.py | 2 +-
Lib/test/test_compile.py | 3 +-
Lib/test/test_dynamic.py | 8 +--
Lib/test/test_exceptions.py | 8 +--
Lib/test/test_isinstance.py | 12 ++--
Lib/test/test_marshal.py | 3 +-
.../2022-09-05-09-56-32.gh-issue-91079.H4-DdU.rst | 3 +
Modules/_testinternalcapi.c | 4 +-
Parser/asdl_c.py | 9 +--
Python/Python-ast.c | 9 +--
Python/ast.c | 9 +--
Python/ast_opt.c | 9 +--
Python/ceval.c | 81 +++++++++++++++-------
Python/pystate.c | 5 +-
Python/symtable.c | 9 +--
Python/sysmodule.c | 2 +-
22 files changed, 165 insertions(+), 99 deletions(-)
create mode 100644 Misc/NEWS.d/next/Core and Builtins/2022-09-05-09-56-32.gh-issue-91079.H4-DdU.rst
@markshannon - greetings from Oxford, by the way :-)
Yes. I was suggesting exposing an API for the C recursion limit, but not receiving active feedback.
See the discussion here: https://github.com/python/cpython/issues/107263
@PeterLuschny - basic Fibonacci
# fib.py
import sys
sys.setrecursionlimit(2000)
from functools import cache
@cache
def fib(n):
if n<1: return 0
if n==1: return 1
return fib(n-1) + fib(n-2)
print(fib(500))
shows the bug just as well
it's not "probably in connection with @
cache", it's definitely
The bug is triggered by the use of the C implementation of _lru_cache_wrapper
, in Modules/_functoolsmodule.c
.
To see this, disable it (then Python code will be used) by applying
--- a/Lib/functools.py
+++ b/Lib/functools.py
@@ -641,7 +641,7 @@ def cache_clear():
return wrapper
try:
- from _functools import _lru_cache_wrapper
+ from _functools import no_no_lru_cache_wrapper
except ImportError:
pass
and see ./python <fib.py
computing the answer, instead of erroring out.
In a way, it's to be expected, as C code isn't governed by setrecursionlimit
any more in 3.12+.
@arhadthedev This is not related to performance. It's a new hard-coded limit not adjustable.
@markshannon ^^
We added the C recursion limit for safety reasons. We don't want to give that up.
We also opted for portability of code by setting the C recursion limit to a single value for all platforms. Perhaps that was a mistake and we should choose different limits at build time depending on the platform and the build, improving backwards compatibility at the expense of portability.
I don't understand how you can nuke all the recursive uses of Python C extensions in the name of "safety", sorry.
For a small sampling, have a look how many code chunks on GitHub have @cache
and setrecursionlimit
in one place. I counted over 400.
@dimpase: your tone is aggressive. that's uncalled for. please keep it civil.
@dimpase: your tone is aggressive. that's uncalled for. please keep it civil.
Apologies. In my defense, I can only say that I am speaking from one of the pitfalls wisely predicted by the SC here.
* The benefit of PEP 651 is negated as soon as a non-Python function is involved in the
recursion, making the likelihood of it being useful even smaller. It also creates
easy pitfalls for users who do end up relying on recursion.
in the rejection of PEP 651 – Robust Stack Overflow Handling
@dimpase: your tone is aggressive. that's uncalled for. please keep it civil.
Apologies. In my defense, I can only say that I am speaking from one of the pitfalls wisely predicted by the SC here.
* The benefit of PEP 651 is negated as soon as a non-Python function is involved in the recursion, making the likelihood of it being useful even smaller. It also creates easy pitfalls for users who do end up relying on recursion.
in the rejection of PEP 651 – Robust Stack Overflow Handling
I agree with this. It causes large inconvenience for users who do end up relying on recursion.
Their right to use C recursion is deprived, even without notice.
And, by the way, limiting the C stack cannot prevent all stack overflows.
500, 1000, or 1500 are just rough estimates. We can still create stack overflows under these conditions.
The idiomatic way is to register a signal handler (on UNIX) / VectoredExceptionHandler (on Windows) to check if the stack pointer is out of the boundary while not recoverable.
I am fine with having these limits, but please provide a way to circumvent them.
Why are users forbidden to use large stack memory even if they provide one?
I think it would be more productive to come up with a way that we can calculate the C stack size accurately rather than continue to explain why it's not great (we know, we know).
calculate the C stack size accurately
Yes, we can estimate it better. We don't need to be that accurate. We can make sure the user can make it larger.
For example on Linux, we can query the stack limit at the program thread start. https://linux.die.net/man/2/getrlimit Then set the C recursion limit.
When the user sets the stack limit of 1MB, he gets 1MB / 2KB = 500
.
When he needs 2000, all he needs to do is set the stack limit to 1 MB * 2000 / 500 = 4 MB
. The main conflict will be resolved.
getrlimit(RLIMIT_STACK
only returns the size of the main process thread, not the running thread. There is no posix API to get a given threads stack size. It can only be controlled at thread creation time and CPython does not control creation of all threads. It is common for C/C++ applications to launch threads with a different stack size than the overly large main thread default.
There is no posix API to get a given threads stack size.
Note that there is a Linux only GNU extension that works if you #define _GNU_SOURCE
before the #include <pthread.h>
:
size_t get_pthread_stack_size(pthread_t tid) {
pthread_attr_t attr;
pthread_attr_init( &attr ); // probably not needed
if (pthread_getattr_np(tid, &attr) < 0) {
perror("pthread_getattr_np failed");
return 0;
}
size_t stack_size = 0;
if (pthread_attr_getstacksize(&attr, &stack_size) < 0) {
perror("pthread_attr_getstacksize failed");
return 0;
}
return stack_size;
}
macOS appears to have its own non-standard pthread_get_stacksize_np()
function. It's bit of a mess of non-standard APIs.
_(my best guess is that the _np
suffix in both worlds means "non-posix")_
getrlimit(RLIMIT_STACK
only returns the size of the main process thread, not the running thread. There is no posix API to get a given threads stack size. It can only be controlled at thread creation time and CPython does not control creation of all threads. It is common for C/C++ applications to launch threads with a different stack size than the overly large main thread default.
Yeah, you are right. It's just too niche (too target-specific).
I also understand Python would prefer resumeable error to just aborting the program.
Then, how about providing the user an option to disable this C recursion limit check? Let the user manage themselves under this mode.
getrlimit(RLIMIT_STACK
only returns the size of the main process thread, not the running thread. There is no posix API to get a given threads stack size. It can only be controlled at thread creation time and CPython does not control creation of all threads. It is common for C/C++ applications to launch threads with a different stack size than the overly large main thread default.
Isn't it possible to get C stack size (at thread creation time) directly via pthread_attr_getstacksize
(which is POSIX)
instead of a GNU extension? These give you the minimal stack size. (Not sure what minimal
stands for here).
One way or another, knowing C stack limit does not give one complete knowledge on the maximal C recursion depth, as different functions place different amount of data on stack.
Anyhow, for the issue at hand, the C version _functools._lru_cache_wrapper()
, behind all this, is an implementation of a Python wrapper, what it produces has to be compatible with the Python version functools._lru_cache_wrapper()
, recursion-depth-wise too.
Isn't it possible to get C stack size (at thread creation time) directly via
pthread_attr_getstacksize
(which is POSIX) instead of a GNU extension?
Nope. That is merely a function for accessing the opaque pthread_attr_t
structure data.
Isn't it possible to get C stack size (at thread creation time) directly via
pthread_attr_getstacksize
(which is POSIX) instead of a GNU extension?Nope. That is merely a function for accessing the opaque
pthread_attr_t
structure data.
the code you posted in https://github.com/python/cpython/issues/112215#issuecomment-1820206345 is basically a small wrapper around pthread_attr_getstacksize
, isn't it?
Just copy it all if you must - there is nothing magical about a GNU extension there.
no it isn't. pthread_getattr_np()
fetches the thread information.
no it isn't.
pthread_getattr_np()
fetches the thread information.
OK, sorry, I misunderstood that. It seems that musl
has similar functions implemented, so it doesn't look impossible to have this for all platforms. (I don't know anything about Windows, though).
Python 3.12 should not be left in its current state. The OP's example demonstrates a major regression.
Starting about mid-week, I always see 14 test failures on Windows 10 (Pro), seemingly due to this. Current main branch, release build, Visual Studio 2022, nothing customized:
14 tests failed:
test_call test_compile test_copy test_descr test_dictviews
test_exceptions test_functools test_importlib test_isinstance
test_pickle test_pickletools test_richcmp test_typing
test_xml_etree
Sometimes there's an explicit message about Windows stack overflow. Historically, as I recall we always needed a lower recursion limit on Windows (than on Linux).
Note: 13 of the 14 tests named above still pass in a debug build. As I recall, test_importlib
(which still fails in a debug build) started flaking out before the other ones:
Traceback (most recent call last):
File "C:\Code\Python\Lib\test\test_importlib\test_main.py", line 424, in test_packages_distributions_symlinked_top_level
fixtures.build_files(files, self.site_dir)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
File "C:\Code\Python\Lib\test\test_importlib\_path.py", line 70, in build
create(contents, _ensure_tree_maker(prefix) / name)
~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Code\Python\Lib\functools.py", line 911, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "C:\Code\Python\Lib\test\test_importlib\_path.py", line 91, in _
path.symlink_to(content)
~~~~~~~~~~~~~~~^^^^^^^^^
File "C:\Code\Python\Lib\pathlib\__init__.py", line 441, in symlink_to
os.symlink(target, self, target_is_directory)
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1314] A required privilege is not held by the client: '.symlink.target' -> 'C:\\Users\\Tim\\AppData\\Local\\Temp\\test_python_zgi02bbl\\tmp8n3vtxjo\\symlinked'
Hm, maybe the CI uses debug build.
Marking this as release blocker since we discussed this off-line. The plan is to restore 3.12 to the 3.11 behavior (which is a little tricky since various tests were added/updated, but it's doable). For 3.13 we will do per-platform things (and per-config-setting, e.g. debug) by default and add an API to change the C recursion limit.
See also https://github.com/python/cpython/pull/113403#issuecomment-1868609456 for some measurements of what we can actually support without crashing (on Mac, but apparently Linux is similar, and Windows is dramatically different).
Starting about mid-week, I always see 14 test failures on Windows 10 (Pro), seemingly due to this. Current main branch, release build, Visual Studio 2022, nothing customized:
I get this too. Windows 11, main branch. We also see it in the Azure Pipelines CI build, e.g. https://dev.azure.com/Python/cpython/_build/results?buildId=147048&view=results (requires login?)
What's the current state of this issue? I know some of the test failures were worked around, but do we need anything to address user concerns that hasn't already been done? Should this block today's 3.12 release?
Do you need to delegate the reverts to a developer-in-residence?
https://github.com/python/cpython/pull/115083 ports the various C recursion limit changes from main, but it is unlikely to be ready until tomorrow as I need to get the limits and tests fixed for wasi.
I don't think it should block the release.
@guido said:
Marking this as release blocker since we discussed this off-line. The plan is to restore 3.12 to the 3.11 behavior (which is a little tricky since various tests were added/updated, but it's doable). For 3.13 we will do per-platform things (and per-config-setting, e.g. debug) by default and add an API to change the C recursion limit.
Do I understand correctly that the plan for 3.12 changed to backporting various changes from main?
Is there anything left to do for this? A fix went in for the next 3.12 (scheduled in two months). Is there anything that still needs to be done for 3.13, before or after today's 3.13 alpha 4?
We might change the implementation for 3.13, to use the actual stack depth (in bytes) rather than a count. What we have now works well enough though, and better than what we had historically.
Bug report
A minimal example:
CPython versions tested on:
CPython main branch
Operating systems tested on:
Windows
Edit: This (simpler) example is from @dimpase .
Linked PRs