PeterLuschny commented 10 months ago

Bug report

A minimal example:


#  fib.py 
import sys
sys.setrecursionlimit(2000)

from functools import cache

@cache
def fib(n):
    if n<1: return 0
    if n==1: return 1
    return fib(n-1) + fib(n-2)

print(fib(500))

CPython versions tested on:

CPython main branch

Operating systems tested on:

Windows

Edit: This (simpler) example is from @dimpase .

Linked PRs

gh-113397
gh-113403
gh-115083

dimpase commented 10 months ago

also tested on Linux (with the call to sys.getwindowsversion() removed).

As well, there is no discrepancy with 3.11 vs 3.12 if @cache is removed. So the working hypothesis is that @cache is to blame for the regression.

The current master is bad too. Bisecting currently, last year things were still OK.

In my testing I modified the lines

    sys.set_int_max_str_digits(5000)
    print(sys.version_info, sys.getwindowsversion())

I removed the 1st one (it's very recent), and only left

    print(sys.version_info)

dimpase commented 10 months ago

I found the 1st bad commit, 76449350b3467b85bcb565f9e2bf945bd150a66e, by bisecting:

76449350b3467b85bcb565f9e2bf945bd150a66e is the first bad commit
commit 76449350b3467b85bcb565f9e2bf945bd150a66e
Author: Mark Shannon <mark@hotpy.org>
Date:   Wed Oct 5 01:34:03 2022 +0100

    GH-91079: Decouple C stack overflow checks from Python recursion checks. (GH-96510)

 Include/cpython/pystate.h                          | 16 ++++-
 Include/internal/pycore_ceval.h                    | 21 +++---
 Include/internal/pycore_runtime_init.h             |  2 +-
 Lib/test/support/__init__.py                       |  5 +-
 Lib/test/test_ast.py                               |  6 +-
 Lib/test/test_call.py                              | 38 ++++++++++
 Lib/test/test_collections.py                       |  2 +-
 Lib/test/test_compile.py                           |  3 +-
 Lib/test/test_dynamic.py                           |  8 +--
 Lib/test/test_exceptions.py                        |  8 +--
 Lib/test/test_isinstance.py                        | 12 ++--
 Lib/test/test_marshal.py                           |  3 +-
 .../2022-09-05-09-56-32.gh-issue-91079.H4-DdU.rst  |  3 +
 Modules/_testinternalcapi.c                        |  4 +-
 Parser/asdl_c.py                                   |  9 +--
 Python/Python-ast.c                                |  9 +--
 Python/ast.c                                       |  9 +--
 Python/ast_opt.c                                   |  9 +--
 Python/ceval.c                                     | 81 +++++++++++++++-------
 Python/pystate.c                                   |  5 +-
 Python/symtable.c                                  |  9 +--
 Python/sysmodule.c                                 |  2 +-
 22 files changed, 165 insertions(+), 99 deletions(-)
 create mode 100644 Misc/NEWS.d/next/Core and Builtins/2022-09-05-09-56-32.gh-issue-91079.H4-DdU.rst

dimpase commented 10 months ago

@markshannon - greetings from Oxford, by the way :-)

sunmy2019 commented 10 months ago

Yes. I was suggesting exposing an API for the C recursion limit, but not receiving active feedback.

See the discussion here: https://github.com/python/cpython/issues/107263

dimpase commented 10 months ago

@PeterLuschny - basic Fibonacci

#  fib.py 
import sys
sys.setrecursionlimit(2000)

from functools import cache

@cache
def fib(n):
    if n<1: return 0
    if n==1: return 1
    return fib(n-1) + fib(n-2)

print(fib(500))

shows the bug just as well

dimpase commented 10 months ago

it's not "probably in connection with @cache", it's definitely

dimpase commented 10 months ago

The bug is triggered by the use of the C implementation of _lru_cache_wrapper, in Modules/_functoolsmodule.c. To see this, disable it (then Python code will be used) by applying

--- a/Lib/functools.py
+++ b/Lib/functools.py
@@ -641,7 +641,7 @@ def cache_clear():
     return wrapper

 try:
-    from _functools import _lru_cache_wrapper
+    from _functools import no_no_lru_cache_wrapper
 except ImportError:
     pass

and see ./python <fib.py computing the answer, instead of erroring out.

In a way, it's to be expected, as C code isn't governed by setrecursionlimit any more in 3.12+.

sunmy2019 commented 10 months ago

@arhadthedev This is not related to performance. It's a new hard-coded limit not adjustable.

gvanrossum commented 10 months ago

@markshannon ^^

markshannon commented 10 months ago

We added the C recursion limit for safety reasons. We don't want to give that up.

We also opted for portability of code by setting the C recursion limit to a single value for all platforms. Perhaps that was a mistake and we should choose different limits at build time depending on the platform and the build, improving backwards compatibility at the expense of portability.

dimpase commented 10 months ago

I don't understand how you can nuke all the recursive uses of Python C extensions in the name of "safety", sorry.

For a small sampling, have a look how many code chunks on GitHub have @cache and setrecursionlimit in one place. I counted over 400.

gpshead commented 10 months ago

@dimpase: your tone is aggressive. that's uncalled for. please keep it civil.

dimpase commented 10 months ago

@dimpase: your tone is aggressive. that's uncalled for. please keep it civil.

Apologies. In my defense, I can only say that I am speaking from one of the pitfalls wisely predicted by the SC here.

* The benefit of PEP 651 is negated as soon as a non-Python function is involved in the 
  recursion, making the likelihood of it being useful even smaller. It also creates 
  easy pitfalls for users who do end up relying on recursion.

in the rejection of PEP 651 – Robust Stack Overflow Handling

sunmy2019 commented 10 months ago

@dimpase: your tone is aggressive. that's uncalled for. please keep it civil.

Apologies. In my defense, I can only say that I am speaking from one of the pitfalls wisely predicted by the SC here.
* The benefit of PEP 651 is negated as soon as a non-Python function is involved in the 
  recursion, making the likelihood of it being useful even smaller. It also creates 
  easy pitfalls for users who do end up relying on recursion.
in the rejection of PEP 651 – Robust Stack Overflow Handling

I agree with this. It causes large inconvenience for users who do end up relying on recursion.

Their right to use C recursion is deprived, even without notice.

sunmy2019 commented 10 months ago

And, by the way, limiting the C stack cannot prevent all stack overflows.

500, 1000, or 1500 are just rough estimates. We can still create stack overflows under these conditions.

The idiomatic way is to register a signal handler (on UNIX) / VectoredExceptionHandler (on Windows) to check if the stack pointer is out of the boundary while not recoverable.

I am fine with having these limits, but please provide a way to circumvent them.

Why are users forbidden to use large stack memory even if they provide one?

gvanrossum commented 10 months ago

I think it would be more productive to come up with a way that we can calculate the C stack size accurately rather than continue to explain why it's not great (we know, we know).

sunmy2019 commented 10 months ago

calculate the C stack size accurately

Yes, we can estimate it better. We don't need to be that accurate. We can make sure the user can make it larger.

For example on Linux, we can query the stack limit at the ~~program~~ thread start. https://linux.die.net/man/2/getrlimit Then set the C recursion limit.

When the user sets the stack limit of 1MB, he gets 1MB / 2KB = 500.

When he needs 2000, all he needs to do is set the stack limit to 1 MB * 2000 / 500 = 4 MB. The main conflict will be resolved.

gpshead commented 10 months ago

getrlimit(RLIMIT_STACK only returns the size of the main process thread, not the running thread. There is no posix API to get a given threads stack size. It can only be controlled at thread creation time and CPython does not control creation of all threads. It is common for C/C++ applications to launch threads with a different stack size than the overly large main thread default.

gpshead commented 10 months ago

There is no posix API to get a given threads stack size.

Note that there is a Linux only GNU extension that works if you #define _GNU_SOURCE before the #include <pthread.h>:

size_t get_pthread_stack_size(pthread_t tid) {
  pthread_attr_t attr;
  pthread_attr_init( &attr );  // probably not needed
  if (pthread_getattr_np(tid, &attr) < 0) {
    perror("pthread_getattr_np failed");
    return 0;
  }
  size_t stack_size = 0;
  if (pthread_attr_getstacksize(&attr, &stack_size) < 0) {
    perror("pthread_attr_getstacksize failed");
    return 0;
  }
  return stack_size;
}

macOS appears to have its own non-standard pthread_get_stacksize_np() function. It's bit of a mess of non-standard APIs.

_(my best guess is that the _np suffix in both worlds means "non-posix")_

sunmy2019 commented 10 months ago

getrlimit(RLIMIT_STACK only returns the size of the main process thread, not the running thread. There is no posix API to get a given threads stack size. It can only be controlled at thread creation time and CPython does not control creation of all threads. It is common for C/C++ applications to launch threads with a different stack size than the overly large main thread default.

Yeah, you are right. It's just too niche (too target-specific).

I also understand Python would prefer resumeable error to just aborting the program.

Then, how about providing the user an option to disable this C recursion limit check? Let the user manage themselves under this mode.

dimpase commented 10 months ago

getrlimit(RLIMIT_STACK only returns the size of the main process thread, not the running thread. There is no posix API to get a given threads stack size. It can only be controlled at thread creation time and CPython does not control creation of all threads. It is common for C/C++ applications to launch threads with a different stack size than the overly large main thread default.

Isn't it possible to get C stack size (at thread creation time) directly via pthread_attr_getstacksize (which is POSIX) instead of a GNU extension? These give you the minimal stack size. (Not sure what minimal stands for here).

One way or another, knowing C stack limit does not give one complete knowledge on the maximal C recursion depth, as different functions place different amount of data on stack.

Anyhow, for the issue at hand, the C version _functools._lru_cache_wrapper(), behind all this, is an implementation of a Python wrapper, what it produces has to be compatible with the Python version functools._lru_cache_wrapper(), recursion-depth-wise too.

gpshead commented 10 months ago

Isn't it possible to get C stack size (at thread creation time) directly via pthread_attr_getstacksize (which is POSIX) instead of a GNU extension?

Nope. That is merely a function for accessing the opaque pthread_attr_t structure data.

dimpase commented 10 months ago

Isn't it possible to get C stack size (at thread creation time) directly via pthread_attr_getstacksize (which is POSIX) instead of a GNU extension?

Nope. That is merely a function for accessing the opaque pthread_attr_t structure data.

the code you posted in https://github.com/python/cpython/issues/112215#issuecomment-1820206345 is basically a small wrapper around pthread_attr_getstacksize, isn't it? Just copy it all if you must - there is nothing magical about a GNU extension there.

gpshead commented 10 months ago

no it isn't. pthread_getattr_np() fetches the thread information.

dimpase commented 10 months ago

no it isn't. pthread_getattr_np() fetches the thread information.

OK, sorry, I misunderstood that. It seems that musl has similar functions implemented, so it doesn't look impossible to have this for all platforms. (I don't know anything about Windows, though).

rhettinger commented 10 months ago

Python 3.12 should not be left in its current state. The OP's example demonstrates a major regression.

tim-one commented 9 months ago

Starting about mid-week, I always see 14 test failures on Windows 10 (Pro), seemingly due to this. Current main branch, release build, Visual Studio 2022, nothing customized:

14 tests failed:
    test_call test_compile test_copy test_descr test_dictviews
    test_exceptions test_functools test_importlib test_isinstance
    test_pickle test_pickletools test_richcmp test_typing
    test_xml_etree

Sometimes there's an explicit message about Windows stack overflow. Historically, as I recall we always needed a lower recursion limit on Windows (than on Linux).

tim-one commented 9 months ago

Note: 13 of the 14 tests named above still pass in a debug build. As I recall, test_importlib (which still fails in a debug build) started flaking out before the other ones:

Traceback (most recent call last):
  File "C:\Code\Python\Lib\test\test_importlib\test_main.py", line 424, in test_packages_distributions_symlinked_top_level
    fixtures.build_files(files, self.site_dir)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Code\Python\Lib\test\test_importlib\_path.py", line 70, in build
    create(contents, _ensure_tree_maker(prefix) / name)
    ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Code\Python\Lib\functools.py", line 911, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "C:\Code\Python\Lib\test\test_importlib\_path.py", line 91, in _
    path.symlink_to(content)
    ~~~~~~~~~~~~~~~^^^^^^^^^
  File "C:\Code\Python\Lib\pathlib\__init__.py", line 441, in symlink_to
    os.symlink(target, self, target_is_directory)
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1314] A required privilege is not held by the client: '.symlink.target' -> 'C:\\Users\\Tim\\AppData\\Local\\Temp\\test_python_zgi02bbl\\tmp8n3vtxjo\\symlinked'

gvanrossum commented 9 months ago

Hm, maybe the CI uses debug build.

gvanrossum commented 9 months ago

Marking this as release blocker since we discussed this off-line. The plan is to restore 3.12 to the 3.11 behavior (which is a little tricky since various tests were added/updated, but it's doable). For 3.13 we will do per-platform things (and per-config-setting, e.g. debug) by default and add an API to change the C recursion limit.

See also https://github.com/python/cpython/pull/113403#issuecomment-1868609456 for some measurements of what we can actually support without crashing (on Mac, but apparently Linux is similar, and Windows is dramatically different).

gvanrossum commented 9 months ago

Starting about mid-week, I always see 14 test failures on Windows 10 (Pro), seemingly due to this. Current main branch, release build, Visual Studio 2022, nothing customized:

I get this too. Windows 11, main branch. We also see it in the Azure Pipelines CI build, e.g. https://dev.azure.com/Python/cpython/_build/results?buildId=147048&view=results (requires login?)

Yhg1s commented 8 months ago

What's the current state of this issue? I know some of the test failures were worked around, but do we need anything to address user concerns that hasn't already been done? Should this block today's 3.12 release?

encukou commented 8 months ago

Do you need to delegate the reverts to a developer-in-residence?

markshannon commented 8 months ago

https://github.com/python/cpython/pull/115083 ports the various C recursion limit changes from main, but it is unlikely to be ready until tomorrow as I need to get the limits and tests fixed for wasi.

I don't think it should block the release.

encukou commented 8 months ago

@guido said:

Marking this as release blocker since we discussed this off-line. The plan is to restore 3.12 to the 3.11 behavior (which is a little tricky since various tests were added/updated, but it's doable). For 3.13 we will do per-platform things (and per-config-setting, e.g. debug) by default and add an API to change the C recursion limit.

Do I understand correctly that the plan for 3.12 changed to backporting various changes from main?

Yhg1s commented 8 months ago

Is there anything left to do for this? A fix went in for the next 3.12 (scheduled in two months). Is there anything that still needs to be done for 3.13, before or after today's 3.13 alpha 4?

encukou commented 8 months ago

Like Mark, I don't think this issue should block alpha 4.

markshannon commented 8 months ago

We might change the implementation for 3.13, to use the actual stack depth (in bytes) rather than a count. What we have now works well enough though, and better than what we had historically.

python / cpython

3.12 setrecursionlimit is ignored in connection with `@functools.cache` #112215

Bug report

A minimal example:

CPython versions tested on:

Operating systems tested on:

Linked PRs