saghul / pycares

Python interface for c-ares
https://pypi.org/project/pycares/
MIT License
162 stars 74 forks source link

Intermittent freeze: Channel.query() call never returning #197

Open davidmcnabnz opened 4 months ago

davidmcnabnz commented 4 months ago

On rare occasions, I'm seeing calls to Channel.query() blocking indefinitely, never returning.

pycares version 4.2.1.

A partial py-spy trace which illustrates this is:

Thread 380494 (idle): "MainThread"
    write (libpthread-2.31.so)
    _Py_DECREF (object.h:422)
    _my_PyErr_WriteUnraisable (_cffi_backend.c:6113)
    general_invoke_callback (_cffi_errors.h:147)
    gil_release (misc_thread_common.h:370)
    cffi_call_python (call_python.c:278)
    _sock_state_cb (_cares.c:998)
    open_udp_socket (ares_process.c:1240)
    ares__send_query (ares_process.c:854)
    ares_send (ares_send.c:131)
    ares_query (ares_query.c:138)
    _cffi_f_ares_query (_cares.c:3287)
    _do_query (pycares/__init__.py:581)
    query (pycares/__init__.py:561)

This issue is highly intermittent, occurs only very rarely, and extremely hard to reproduce consistently.

To get to the point of being able to trace a stuck process, I had to wait several hours after starting a service on a busy production server.

Even though it has the saving grace of releasing the GIL, any package which is susceptible to getting unpredictably trapped inside a C function call is a concern, especially due to the increasing uptake of asyncio, and the 'achilles heel' of asyncio event loops being vulnerable to blocking C calls.

Any advice here would be most welcome.

saghul commented 4 months ago

For the sake of completeness: https://github.com/saghul/aiodns/issues/122#issuecomment-2036568065