python-greenlet / greenlet

Lightweight in-process concurrent programming
Other
1.63k stars 247 forks source link

Segfault: Possible error with wheel for macOS 11 Universal #382

Closed Brax94 closed 8 months ago

Brax94 commented 10 months ago

Encountered an error when running code that used gevent when upgrading from python version 3.8 to 3.10.

TLDR: If segfault is encountered when running gevent for mac and an issue is encountered in gevent/select.py, try to install greenlet with --no-binary :all: flag, (pip --no-binary :all: greenlet==3.0.1).

Managed to isolate the issue, and make a minimal reproducible example, using a conda environment with gevent (tested 23.9.1 and 23.7.0 and 22.10.2, all has same issue as long as greenlet >= 3.0.0), greenlet>=3.0.0 and pika=1.3.2:

from gevent.monkey import patch_all
patch_all()
import pika
def test_basic_segfault():
        connection = pika.BlockingConnection(
            pika.URLParameters('amqp://localhost:5672'),
        )
        print(connection)

if __name__ == '__main__':
    test_basic_segfault()

Error message is as follows: `(segfault_test) ➜ segfault_test PYTHONPATH=. PYTHONFAULTHANDLER=1 python test_segfault.py Fatal Python error: Segmentation fault

Current thread 0x000000011a411e00 (most recent call first): File "/Users/eliash/dev/miniconda3/envs/segfault_test/lib/python3.10/site-packages/gevent/select.py", line 339 in poll File "/Users/eliash/dev/miniconda3/envs/segfault_test/lib/python3.10/site-packages/pika/adapters/select_connection.py", line 1184 in poll File "/Users/eliash/dev/miniconda3/envs/segfault_test/lib/python3.10/site-packages/pika/adapters/select_connection.py", line 579 in poll File "/Users/eliash/dev/miniconda3/envs/segfault_test/lib/python3.10/site-packages/pika/adapters/blocking_connection.py", line 445 in _create_connection File "/Users/eliash/dev/miniconda3/envs/segfault_test/lib/python3.10/site-packages/pika/adapters/blocking_connection.py", line 360 in init File "/Users/eliash/dev/segfault_test/test_segfault.py", line 5 in test_basic_segfault File "/Users/eliash/dev/segfault_test/test_segfault.py", line 11 in

Extension modules: greenlet._greenlet, zope.interface._zope_interface_coptimizations, gevent.libev.corecext, gevent._gevent_c_greenlet_primitives, gevent._gevent_c_hub_local, gevent._gevent_c_waiter, gevent._gevent_c_hub_primitives, gevent._gevent_c_ident, gevent._gevent_cgreenlet, gevent._gevent_c_abstract_linkable, gevent._gevent_c_semaphore, gevent._gevent_clocal, gevent._gevent_cevent, gevent._gevent_cqueue (total: 14) [1] 1249 segmentation fault PYTHONPATH=. PYTHONFAULTHANDLER=1 python test_segfault.py`

We then tested with greenlet==2.0.2, which resolved the issue.

Suspecting the wheel, we tried 3.0.1 again with pip --no-binary :all: greenlet==3.0.1, which successfully resolved the issue.

Hopefully this can help someone

zzzeek commented 10 months ago

we're seeing a user with a greenlet segfault on mac OSX as well, not sure if they will post a new issue, that's at https://github.com/sqlalchemy/sqlalchemy/discussions/10553 we have a no-dependencies reproduction case over there.

jpc1976 commented 10 months ago

Hi, I am the user referred to by @zzzeek in https://github.com/python-greenlet/greenlet/issues/382#issuecomment-1787186111 above.

I have been having hard interpreter crashes in code involving greenlet when an exception is raised.

Here's a minimalistic script which reproduces the issue:

from greenlet import greenlet
from greenlet import getcurrent
import asyncio
import sys

def non_awaitable(arg):
    print(f"non awaitable called with {arg}")
    return current.switch(awaitable(arg))

async def awaitable(arg):
    print(f"awaitable called with {arg}")
    if arg == "raise":
        raise Exception("oops")

    return "return value"

async def main():

    for arg in "some arg", "raise":
        gr1 = greenlet(non_awaitable)

        result = gr1.switch(arg)
        while not gr1.dead:
            try:
                value = await result
            except BaseException:
                result = gr1.throw(*sys.exc_info())
            else:
                result = gr1.switch(value)

current = getcurrent()

asyncio.run(main())

Using either python 3.12.0 or 3.11.5 with greenlet 3.0.1, I get the following segfault:

non awaitable called with some arg
awaitable called with some arg
non awaitable called with raise
awaitable called with raise
Segmentation fault: 11

However, the issue disappears when I rollback to greenlet 2.0.2 (using python 3.11.5).

In ran a faulty script in valgrind, and here's the error which triggers the segfault:

==48137== Invalid read of size 8
==48137==    at 0x10126B936: __cxxabiv1::scan_eh_tab(__cxxabiv1::(anonymous namespace)::scan_results&, _Unwind_Action, bool, _Unwind_Exception*, _Unwind_Context*) (in /usr/lib/libc++abi.dylib)
==48137==    by 0x10126B584: __gxx_personality_v0 (in /usr/lib/libc++abi.dylib)
==48137==    by 0x101872AA0: _Unwind_RaiseException (in /usr/lib/system/libunwind.dylib)
==48137==    by 0x10126B160: __cxa_throw (in /usr/lib/libc++abi.dylib)
==48137==    by 0x103CA5A14: greenlet::Greenlet::g_switch_finish(greenlet::Greenlet::switchstack_result_t const&) (in /Users/scott/temp/sqlalchemy/crash/.venv/lib/python3.11/site-packages/greenlet/_gree
nlet.cpython-311-darwin.so)
==48137==    by 0x103CA236E: greenlet::MainGreenlet::g_switch() (in /Users/scott/temp/sqlalchemy/crash/.venv/lib/python3.11/site-packages/greenlet/_greenlet.cpython-311-darwin.so)
==48137==    by 0x103CA350C: green_switch(_greenlet*, _object*, _object*) (in /Users/scott/temp/sqlalchemy/crash/.venv/lib/python3.11/site-packages/greenlet/_greenlet.cpython-311-darwin.so)
==48137==    by 0x100085E43: method_vectorcall_VARARGS_KEYWORDS (in /Users/scott/temp/sqlalchemy/crash/.venv/bin/python)
==48137==    by 0x1001CCEF6: _PyEval_EvalFrameDefault (in /Users/scott/temp/sqlalchemy/crash/.venv/bin/python)
==48137==    by 0x10007389E: _PyFunction_Vectorcall (in /Users/scott/temp/sqlalchemy/crash/.venv/bin/python)
==48137==    by 0x10007975B: method_vectorcall (in /Users/scott/temp/sqlalchemy/crash/.venv/bin/python)
==48137==    by 0x100072F93: PyObject_Call (in /Users/scott/temp/sqlalchemy/crash/.venv/bin/python)
==48137==  Address 0x1000007feedfaef is not stack'd, malloc'd or (recently) free'd

I am on an older MacBook Pro running MacOS 10.15.7 Catalina (on x86_64).

Thanks!

jgehrcke commented 10 months ago

Seeing a segfault in gipc's test suite when running on macos 11: https://github.com/jgehrcke/gipc/pull/131#issuecomment-1789239375, and this looks like being related to the greenlet 3.0.1 release.

Suspecting the wheel, we tried 3.0.1 again with pip --no-binary :all: greenlet==3.0.1, which successfully resolved the issue.

Interesting!

Edit: I went through older build logs of where things were still working and I found indeed that the relevant difference for me, too, was the pre-built wheel vs. the local build from source:

good: local build of 3.0.0rc3 :heavy_check_mark:

Collecting greenlet>=3.0rc3 (from gevent<=23.9.1,>=1.5->gipc==1.5.0)
  Downloading greenlet-3.0.0rc3.tar.gz (174 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 174.7/174.7 kB 2.4 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'

https://github.com/jgehrcke/gipc/actions/runs/6198786578/job/16829922515#step:5:43

bad: binary release of 3.0.1: :x:

Downloading greenlet-3.0.1-cp312-cp312-macosx_10_9_universal2.whl (263 kB)

https://github.com/jgehrcke/gipc/actions/runs/6722228235/job/18269713689?pr=131#step:5:52

jpc1976 commented 10 months ago

Following up on my post https://github.com/python-greenlet/greenlet/issues/382#issuecomment-1787401304 above, I can confirm that a no-binary installation of greenlet 3.0.1 appears to have solved the issue!

madhavajay commented 9 months ago

Is it possible to get this fixed? Its holding back other people from releasing dependant libraries.

jgehrcke commented 9 months ago

Is it possible to get this fixed? Its holding back other people from releasing dependant libraries.

I agree that this is important. I hope we also appreciate that Jason is doing gevent+greenlet development & maintenance basically as a single-person heroic side effort and he's doing really impressive debugging+fixing work on a regular basis. He is active here, very active, and I suppose that there's maybe just a little bit too much work that needs to be done. Ideally, we / someone provides a solution here or offers somewhat concrete help. One could look into that 'bad binary' and do a bit of debugging, and I suppose we could also look at the build/release scripts here and try to see if something stands out.

madhavajay commented 9 months ago

@jgehrcke totally understand. If we get a spare moment maybe we can help, it sounds like if the local compile maybe its worth trying to switch ci to use cibuildwheel.

CaselIT commented 9 months ago

Another user reported a segfault on osx 11.6 on intel using python 3.9 to the sqlalchemy project. Downgrading solves it. Details here https://github.com/sqlalchemy/sqlalchemy/discussions/10671#discussion-5878953

davner commented 9 months ago

Can confirm that downgrading greenlet from 3.0.1 to 2.0.2 fixed the segfault we were seeing in our Django app when using workers. We are using a mac Big Sur 11.7.6 Intel chip.

jamadden commented 9 months ago

Can someone that could reproduce an error please try again with greenlet 3.0.2 and report back?

jgehrcke commented 8 months ago

Can someone that could reproduce an error please try again with greenlet 3.0.2 and report back?

Thank you, Jason. I tried that in https://github.com/jgehrcke/gipc/pull/131 and indeed mac CI went green again (with non-source builds).

In the build log I confirmed it downloaded

greenlet-3.0.2-cp312-cp312-macosx_11_0_universal2.whl (267 kB)

Were you able to pin-point the problem? Curious. In any case, as always, thank you for your invaluable contributions.

jamadden commented 8 months ago

Thank you @jgehrcke !

I don't have the logs from building 3.0.1, so I can't be 100% certain, but I believe it was a compiler flag or compiler version difference.

jgehrcke commented 8 months ago

Okay cool, thanks for getting back.

With this now being sorted out I can do my boring gipc release leveraging all the hard work done in greenlet and gevent! :).