python-greenlet / greenlet

Lightweight in-process concurrent programming
Other
1.63k stars 247 forks source link

Fix #392: Port to Python 3.13 #396

Closed vstinner closed 2 days ago

vstinner commented 6 months ago
vstinner commented 6 months ago

The second commit adds a Python 3.13 CI to GitHub Actions. It's to prove that my change works as expected. I can easily remove that change if you want to drop the 3.13 test for now.

Well, IMO it's good to test as soon as possible ;-)

vstinner commented 6 months ago

Ah. While "tox -e py313" pass locally, it fails on the GitHub Action:

test_break_ctxvars (greenlet.tests.test_contextvars.ContextVarsTests.test_break_ctxvars) ... Fatal Python error: Segmentation fault
Current thread 0x00007fd2a2a61b80 (most recent call first):
  File "/home/runner/work/greenlet/greenlet/src/greenlet/tests/leakcheck.py", line 212 in _run_test
  File "/home/runner/work/greenlet/greenlet/src/greenlet/tests/leakcheck.py", line 295 in __call__
  File "/home/runner/work/greenlet/greenlet/src/greenlet/tests/leakcheck.py", line 317 in wrapper
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/case.py", line 589 in _callTestMethod
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/case.py", line 636 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/case.py", line 692 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 122 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 84 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 122 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 84 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 122 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 84 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/runner.py", line 240 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/main.py", line 270 in runTests
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/main.py", line 104 in __init__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/__main__.py", line 18 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main
Extension modules: greenlet._greenlet, psutil._psutil_linux, psutil._psutil_posix, greenlet.tests._test_extension_cpp, greenlet.tests._test_extension (total: 5)
/home/runner/work/_temp/7f62c942-1552-419a-bf75-ca42301e321a.sh: line 3:  1982 Segmentation fault      (core dumped) python -m unittest discover -v greenlet.tests
Error: Process completed with exit code 139.

I'm not sure about this change:

+  #if GREENLET_USE_CFRAME
     tstate->cframe->current_frame = this->current_frame;
+  #endif

We should save/restore the current_frame, but I'm not sure how.

jamadden commented 6 months ago

Thanks for looking into this @vstinner ! I haven't had any time to keep up with Python 3.13 changes, but I can at least offer some insight into that crash. This is because that same test crashes immediately for me on my M2 Mac. Assuming the crash is the same, here's the backtrace:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0xdddddddddddddde5 -> 0xffffdddddddddde5 (possible pointer authentication failure)
Exception Codes:       0x0000000000000001, 0xdddddddddddddde5

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   Python [25867]

VM Region Info: 0xffffdddddddddde5 is not in any region.  
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      UNUSED SPACE AT END

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib                 0x1877c60dc __pthread_kill + 8
1   libsystem_pthread.dylib                0x1877fdcc0 pthread_kill + 288
2   libsystem_c.dylib                      0x1876d2540 raise + 32
3   Python                                 0x103a15410 faulthandler_fatal_error + 412
4   libsystem_platform.dylib               0x18782da24 _sigtramp + 56
5   _greenlet.cpython-313-darwin.so        0x1037cf910 greenlet::PythonState::operator>>(_ts*) + 56 (TPythonState.cpp:205) [inlined]
6   _greenlet.cpython-313-darwin.so        0x1037cf910 greenlet::Greenlet::g_switchstack_success() + 88 (TGreenlet.cpp:133)
7   _greenlet.cpython-313-darwin.so        0x1037cb0c4 greenlet::Greenlet::g_switchstack() + 288 (TGreenlet.cpp:210)
8   _greenlet.cpython-313-darwin.so        0x1037cfaa8 greenlet::MainGreenlet::g_switch() + 48 (TMainGreenlet.cpp:100)
9   _greenlet.cpython-313-darwin.so        0x1037d0cc4 green_switch(_greenlet*, _object*, _object*) + 268 (greenlet.cpp:530)
10  Python                                 0x1038e27b8 cfunction_call + 72
11  Python                                 0x103890124 _PyObject_MakeTpCall + 128
12  Python                                 0x1039b5f80 context_run + 104
13  Python                                 0x1038e1ee8 cfunction_vectorcall_FASTCALL_KEYWORDS.llvm.10847317658053313408 + 92
14  _greenlet.cpython-313-darwin.so        0x1037cce54 greenlet::UserGreenlet::inner_bootstrap(_greenlet*, _object*) + 264 (TUserGreenlet.cpp:464)
15  _greenlet.cpython-313-darwin.so        0x1037cc6c8 greenlet::UserGreenlet::g_initialstub(void*) + 1364 (TUserGreenlet.cpp:311)
16  _greenlet.cpython-313-darwin.so        0x1037cf298 greenlet::UserGreenlet::g_switch() + 136 (TUserGreenlet.cpp:179)

The invalid address causing the segfault is interesting...

The crashing code in question:

#if GREENLET_PY311
  #if GREENLET_PY312
    tstate->py_recursion_remaining = tstate->py_recursion_limit - this->py_recursion_depth;
    tstate->c_recursion_remaining = Py_C_RECURSION_LIMIT - this->c_recursion_depth;
    this->unexpose_frames(); // <- CRASH; line 205
  #else // \/ 3.11

unexpose_frames was new for 3.12:

#if GREENLET_PY312
void GREENLET_NOINLINE(PythonState::unexpose_frames)()
{
    if (!this->top_frame()) {
        return;
    }

    // See GreenletState::expose_frames() and the comment on frames_were_exposed
    // for more information about this logic.
    _PyInterpreterFrame *iframe = this->_top_frame->f_frame;
    while (iframe != nullptr) {
        _PyInterpreterFrame *prev_exposed = iframe->previous;
        assert(iframe->frame_obj);
        memcpy(&iframe->previous, &iframe->frame_obj->_f_frame_data[0],
               sizeof(void *));
        iframe = prev_exposed;
    }
}
#else
void PythonState::unexpose_frames()
{}
#endif      

IIRC, this all has to do with the way frames are "lazily" allocated now. We have to turn them into real frames so that we can get the complete stacks of suspended greenlets; that was added in greenlet 3.0.3 (previously, I'd just punted on that problem).

I didn't write much of expose_frames; it was generously contributed in https://github.com/python-greenlet/greenlet/commit/b5f9d232c401d8fba8710778ad85b29a41ec3924

If I disable that functionality by ifdefing out expose/unexpose_frames I get a little farther, but still crash; this time in ContextVarsTests.test_context_not_propagated:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000010489c020
Exception Codes:       0x0000000000000001, 0x000000010489c020

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [26401]

VM Region Info: 0x10489c020 is not in any region.  Bytes after previous region: 33  Bytes before following region: 32736
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      shared memory               104898000-10489c000    [   16K] r--/r-- SM=SHM  
--->  GAP OF 0x8000 BYTES
      VM_ALLOCATE                 1048a4000-1048ac000    [   32K] rw-/rwx SM=PRV  

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   Python                                 0x105209420 dump_frame + 24
1   Python                                 0x105208a04 dump_traceback + 116
2   Python                                 0x105208b88 _Py_DumpTracebackThreads + 324
3   Python                                 0x105219498 faulthandler_dump_traceback + 84
4   Python                                 0x1052193f4 faulthandler_fatal_error + 384
5   libsystem_platform.dylib               0x18782da24 _sigtramp + 56
6   _greenlet.cpython-313-darwin.so        0x104fcf3dc greenlet::PythonState::operator<<(_ts const*) + 104 (TPythonState.cpp:144)
7   _greenlet.cpython-313-darwin.so        0x104fcf3dc greenlet::PythonState::operator<<(_ts const*) + 104 (TPythonState.cpp:144)
8   _greenlet.cpython-313-darwin.so        0x104fcf25c greenlet::Greenlet::g_switchstack() + 144 (TGreenlet.cpp:167)
9   _greenlet.cpython-313-darwin.so        0x104fd3ac8 greenlet::MainGreenlet::g_switch() + 48 (TMainGreenlet.cpp:100)
10  _greenlet.cpython-313-darwin.so        0x104fd1234 greenlet::UserGreenlet::inner_bootstrap(_greenlet*, _object*) + 1396 (TUserGreenlet.cpp:553)
11  _greenlet.cpython-313-darwin.so        0x104fd063c greenlet::UserGreenlet::g_initialstub(void*) + 1364 (TUserGreenlet.cpp:311)
12  _greenlet.cpython-313-darwin.so        0x104fd32c8 greenlet::UserGreenlet::g_switch() + 136 (TUserGreenlet.cpp:179)
13  _greenlet.cpython-313-darwin.so        0x104fd4ce4 green_switch(_greenlet*, _object*, _object*) + 268 (greenlet.cpp:530)
vstinner commented 6 months ago

PythonState::unexpose_frames() is called with this->_top_frame->f_frame = 0xdddddddddddddddd: the struct _frame memory was freed, so reading a member gives 0xdddddddddddddddd pointer. this->_top_frame doesn't "own" the memory. It's just a raw pointer to something managed internally by Python.

I don't know how to move forward on this issue :-(

vstinner commented 6 months ago

The following code of void PythonState::operator<<(const PyThreadState *const tstate) noexcept is scary:

    PyFrameObject *frame = PyThreadState_GetFrame((PyThreadState *)tstate);
    Py_XDECREF(frame);  // PyThreadState_GetFrame gives us a new
                        // reference.
    this->_top_frame.steal(frame);

Why does it call Py_XDECREF(): we have a nice strong reference which is safe to use. Calling Py_XDECREF() converts it to a borrowed reference and so the frame can be deallocated and we end up with a dangling pointer and likely a crash.

Exception Codes:       KERN_INVALID_ADDRESS at 0xdddddddddddddde5 -> 0xffffdddddddddde5 (possible pointer authentication failure)

Exactly: 0xdddddddddd... pointer is a dangling pointer.

vstinner commented 6 months ago

I hacked greenlet to hold a strong reference to the frame, but then another assertion failed...

test_break_ctxvars (greenlet.tests.test_contextvars.ContextVarsTests.test_break_ctxvars) ...

python: Python/ceval.c:894: _PyEval_EvalFrameDefault: Assertion `STACK_LEVEL() == 0' failed.

Program received signal SIGABRT, Aborted.
jamadden commented 6 months ago

Why does it call Py_XDECREF(): ... Calling Py_XDECREF() converts it to a borrowed reference and so the frame can be deallocated

Exactly. IIRC, in most places we keep borrowed references, it's specifically so they can be deallocated. greenlets aren't required to ever finish or get switched back to before they get deallocated (this is surprisingly easy to arrange to happen). That means that any code on the C call stack that was going to decref something like a frame may never get a chance to run. So, while a greenlet is switched out, we essentially transfer ownership out of the C stack and into us. If we kept an extra strong reference, and the C stack code never runs again, we leak.

This exposes implicit assumptions about the objects like that: we assume that "someone else" is keeping a strong reference to it, someone who may not run again to decref; or, put another way, the code that originally "owned" these objects won't run again to decref them until we switch the greenlet back in. It seems like something in this area has changed; for example, we're getting back fresh (transient) objects that have no other owner, or the place in the C stack that was keeping a strong reference is now keeping a borrowed reference.

edgarrmondragon commented 3 months ago

With Python 3.13 in beta, should we try to get this across?

musicinmybrain commented 3 months ago

I just tested this as a patch for the Fedora package, but (when built for Python 3.13) there is still a problem with trying to access the trash member of the opaque PyThreadState struct.

  In file included from src/greenlet/greenlet.cpp:36:
  src/greenlet/TPythonState.cpp: In member function ‘void greenlet::PythonState::operator<<(const PyThreadState*)’:
  src/greenlet/TPythonState.cpp:149:42: error: ‘const PyThreadState’ {aka ‘const struct _ts’} has no member named ‘trash’
    149 |     this->trash_delete_nesting = tstate->trash.delete_nesting;
        |                                          ^~~~~
  src/greenlet/TPythonState.cpp: In member function ‘void greenlet::PythonState::operator>>(PyThreadState*)’:
  src/greenlet/TPythonState.cpp:217:13: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘trash’
    217 |     tstate->trash.delete_nesting = this->trash_delete_nesting;
        |             ^~~~~
  src/greenlet/greenlet.cpp: In function ‘PyObject* mod_get_tstate_trash_delete_nesting(PyObject*)’:
  src/greenlet/greenlet.cpp:1341:36: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘trash’
   1341 |     return PyLong_FromLong(tstate->trash.delete_nesting);
        |                                    ^~~~~
  error: command '/usr/bin/gcc' failed with exit code 1
musicinmybrain commented 3 months ago

I checked internal/pycore_tstate.h for Python 3.13.0b1, and struct _PyThreadStateImpl does not have a member named trash.

vstinner commented 3 months ago

src/greenlet/greenlet.cpp:1341:36: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘trash’

I updated my PR to use tstate->delete_later instead. My PR is still a WIP, the test suite does crash immediately :-(

vstinner commented 3 months ago

With this patch:

diff --git a/src/greenlet/TPythonState.cpp b/src/greenlet/TPythonState.cpp
index bfb40ca..98855c5 100644
--- a/src/greenlet/TPythonState.cpp
+++ b/src/greenlet/TPythonState.cpp
@@ -146,8 +146,8 @@ void PythonState::operator<<(const PyThreadState *const tstate) noexcept
     this->datastack_limit = tstate->datastack_limit;

     PyFrameObject *frame = PyThreadState_GetFrame((PyThreadState *)tstate);
-    Py_XDECREF(frame);  // PyThreadState_GetFrame gives us a new
-                        // reference.
+    //Py_XDECREF(frame);  // PyThreadState_GetFrame gives us a new
+    //                    // reference.
     this->_top_frame.steal(frame);
   #if GREENLET_PY313
     this->delete_later = Py_XNewRef(tstate->delete_later);

I get an assertion error that I don't know how to debug:

test_break_ctxvars (greenlet.tests.test_contextvars.ContextVarsTests.test_break_ctxvars) ...

python: Python/ceval.c:901: _PyEval_EvalFrameDefault: Assertion `STACK_LEVEL() == 0' failed.

Python is built in debug mode.

vstinner commented 3 months ago

I updated my PR and now the whole test suite pass on Python 3.13!

Problem: GitHub Action updated macos-latest to arm64 arch which doesn't support old Python versions. I wrote https://github.com/python-greenlet/greenlet/pull/410 for that.

musicinmybrain commented 3 months ago

This seems to work as a downstream patch in Fedora. Thank you!

mgorny commented 2 months ago

@vstinner, thanks for doing this. Is there any chance I could convince you to try looking at #368? It also affects 3.13 for us.

edgarrmondragon commented 1 month ago

Python 3.13.0rc1 is out, so it'd be nice to get this published so downstream packages can start testing with greenlet cp313- wheels.

EwoutH commented 3 weeks ago

What would be needed to move this forward?

arcivanov commented 3 weeks ago

I have a feeling @jamadden is terribly busy

EwoutH commented 1 week ago

I think we need to consider what to do if this repository is unmaintained.

raghunandan-16 commented 1 week ago

I am getting this error for green let for 3.13 is it related with the above mentioned : " pycore_frame.h(8): fatal error C1189: #error: "this header requires Py_BUILD_CORE define"

edgarrmondragon commented 1 week ago

I see @oremanj has committed to this repo in the past year. Maybe they can offer some options.

jamadden commented 2 days ago

Thank's everyone for your help on this, with a special thanks to @vstinner.

I'll do my best to get a release out to PyPI (even if only an RC) by the end of the week.