pycurl / pycurl

PycURL - Python interface to libcurl
http://pycurl.io/
GNU Lesser General Public License v2.1
1.07k stars 311 forks source link

Assertion failed: self->handle != NULL, file src/threadsupport.c, line 29 #748

Open citrus-it opened 2 years ago

citrus-it commented 2 years ago

Hi,

I have upgraded pycurl from 7.44.0 to 7.45.1 and am seeing the above assertion fail when running the testsuite for a large python application that uses pycurl. I am still investigating and trying to write a small reproducer, and I'll attach one once I can, but I'm opening this in case it's immediately obvious to the maintainers or if anyone else is seeing it since upgrading.

What happened?

When running the testsuite with pycurl.verbose, this is what I see:

,---------------------------------------------------------------------
| $   /usr/bin/python3 /tmp/ips.test.12726/fakeroot/pkg -R /tmp/ips.test.12726/0/image0 -D plandesc_validate=1 -D manifest_validate=Always -D smf_cmds_dir=/tmp/ips.test.12726/0/smf_cmds search -o action.name -a '<example_path>'
| *   Trying 127.0.0.1:12001...
| * Connected to localhost (127.0.0.1) port 12001 (#0)
| > GET /versions/0/ HTTP/1.1
| Host: localhost:12001
| User-Agent: pkg/1a6ee32f8 (sunos i86pc; 5.11 omnios-master-57a305f45c; full; pkg)
| Accept: */*
| X-IPkg-UUID: 089741e2-a468-11ec-9b51-820820101cb6
| X-IPkg-Variant: global,full
|
| * Mark bundle as not supporting multiuse
| < HTTP/1.1 200 OK
| < Content-Type: text/plain;charset=utf-8
| < Server: CherryPy/18.6.1
| < Date: Tue, 15 Mar 2022 13:58:57 GMT
| < Cache-Control: must-revalidate, no-transform, max-age=300
| < Expires: Tue, 15 Mar 2022 14:03:57 GMT
| < Content-Length: 171
| <
| * Connection #0 to host localhost left intact
| * Found bundle for host localhost: 0x4f1fa0 [serially]
| * Re-using existing connection! (#0) with host localhost
| * Connected to localhost (127.0.0.1) port 12001 (#0)
| > GET /test/search/1/False_2_None_None_p%3C%3A%3A%3Aexample_path%3E HTTP/1.1
| Host: localhost:12001
| User-Agent: pkg/1a6ee32f8 (sunos i86pc; 5.11 omnios-master-57a305f45c; full; pkg)
| Accept: */*
| X-IPkg-UUID: 089741e2-a468-11ec-9b51-820820101cb6
| X-IPkg-Variant: global,full
|
| * Mark bundle as not supporting multiuse
| < HTTP/1.1 200 OK
| < Content-Type: text/plain;charset=utf-8
| < Server: CherryPy/18.6.1
| < Date: Tue, 15 Mar 2022 13:58:58 GMT
| < Cache-Control: must-revalidate, no-transform, max-age=14400
| < Expires: Tue, 15 Mar 2022 17:58:58 GMT
| < Transfer-Encoding: chunked
| <
| Assertion failed: self->handle != NULL, file src/threadsupport.c, line 29, function pycurl_get_thread_state

The stack trace from the core file is:

status: process terminated by SIGABRT (Abort), pid=12754 uid=0 code=-1
libc panic message: Assertion failed: self->handle != NULL, file src/threadsupport.c, line 29, function pycurl_get_thread_state

fffffc7fffdf3790 libc.so.1`abort+0x58()
fffffc7fffdf3a00 0xfffffc7feee15382()
fffffc7fffdf3a30 pycurl.cpython-310.so`pycurl_get_thread_state+0x187()
fffffc7fffdf3a50 pycurl.cpython-310.so`pycurl_acquire_thread+0x11()
fffffc7fffdf3ab0 pycurl.cpython-310.so`progress_callback+0x46()
fffffc7fffdf3ba0 libcurl.so.4.7.0`Curl_pgrsUpdate+0x3c4()
fffffc7fffdf3bc0 libcurl.so.4.7.0`Curl_pgrsDone+0x1a()
fffffc7fffdf3d10 libcurl.so.4.7.0`multi_done+0x318()
fffffc7fffdf3d40 libcurl.so.4.7.0`curl_multi_remove_handle+0xb1()
fffffc7fffdf3d80 pycurl.cpython-310.so`util_curl_xdecref+0x87()
fffffc7fffdf3db0 pycurl.cpython-310.so`util_curl_close+0x7c()
fffffc7fffdf3dd0 pycurl.cpython-310.so`do_curl_dealloc+0x48()
fffffc7fffdf3e00 libpython3.10.so.1.0`free_keys_object+0xa5()
fffffc7fffdf3e50 libpython3.10.so.1.0`dict_dealloc+0x6f()
fffffc7fffdf3e70 pycurl.cpython-310.so`do_multi_clear+0x9d()
fffffc7fffdf3fd0 libpython3.10.so.1.0`gc_collect_main+0x4ef()
fffffc7fffdf3ff0 libpython3.10.so.1.0`_PyGC_CollectNoFail+0x35()
fffffc7fffdf4080 libpython3.10.so.1.0`finalize_modules+0x29f()
fffffc7fffdf40c0 libpython3.10.so.1.0`Py_FinalizeEx+0x85()
fffffc7fffdf40e0 libpython3.10.so.1.0`Py_Exit+0xd()
fffffc7fffdf4110 libpython3.10.so.1.0`_PyErr_PrintEx.part.0.cold()
fffffc7fffdf4130 libpython3.10.so.1.0`PyErr_PrintEx+0x1e()
fffffc7fffdf41a0 libpython3.10.so.1.0`_PyRun_SimpleFileObject+0x3bd()
fffffc7fffdf41e0 libpython3.10.so.1.0`_PyRun_AnyFileObject+0x8b()
fffffc7fffdf42d0 libpython3.10.so.1.0`Py_RunMain+0x3d6()
fffffc7fffdf4320 libpython3.10.so.1.0`Py_BytesMain+0x3c()
fffffc7fffdf4350 _start_crt+0x87()
fffffc7fffdf4360 _start+0x18()

so it seems to be being triggered by garbage collection.

What is the PycURL version?

'PycURL/7.45.1 libcurl/7.82.0 OpenSSL/3.0.1 zlib/1.2.11 brotli/1.0.9 zstd/1.5.2 nghttp2/1.47.0'

What is your Python version?

Python 3.10.2

What is your operating system and its version?

OmniOS v11 r151041 SunOS bloody 5.11 omnios-master-57a305f45c i86pc i386 i86pc

swt2c commented 2 years ago

Yeah, a reproducer would be very helpful. If not, you could try bisecting to see which commit introduced the problem.

@fsbs any ideas here?

citrus-it commented 2 years ago

Yeah, a reproducer would be very helpful. If not, you could try bisecting to see which commit introduced the problem.

I'm having trouble writing something standalone to trigger it, the application is pretty big. I'll bisect and report back.

citrus-it commented 2 years ago

ddde58832187211e68e04f16a08342b155baab51

ddde58832187211e68e04f16a08342b155baab51 is the first bad commit
commit ddde58832187211e68e04f16a08342b155baab51
Author: fsbs <fsbs@users.noreply.github.com>
Date:   Tue Oct 12 14:18:23 2021 +0000

    Set easy->multi_stack->state for multi's callbacks

    This ensures multi's callbacks invoked by some easy functions have
    access to thread state.

    Fixes callbacks invoked by:
     * do_curl_pause
     * util_curl_xdecref
swt2c commented 2 years ago

Thanks. That was my suspicion. Hopefully @fsbs will have a chance to look into this.

citrus-it commented 2 years ago

Sorry, still no luck with a reproducer, I'll have another stab at it tonight.

swt2c commented 2 years ago

@citrus-it did you ever get a reproducer for this?

citrus-it commented 2 years ago

Unfortunately not, I'll give it another go this week. Our test suite triggers it every time, but it's a large project so hard to isolate.

citrus-it commented 2 years ago

I have a smaller reproducer now, still using a large part of the local application library so I will keep iterating.

citrus-it commented 2 years ago

I've been unable to make a standalone reproducer for this, the small-ish script I have still pulls in a lot of modules from the application.

However, I can see where the bug is from reading through the code.

Here's the relevant piece of the stack trace from the core file, most recent frame first (so read it backwards), with some in-line annotations.

fffffc7fffdf0520 pycurl.cpython-310.so`pycurl_get_thread_state+0x187()
    if (self->multi_stack != NULL && self->multi_stack->state != NULL)
    {
        /* inside multi_perform() */
        assert(self->handle != NULL);                          <--- assertion fails

fffffc7fffdf0540 pycurl.cpython-310.so`pycurl_acquire_thread+0x11()
fffffc7fffdf05a0 pycurl.cpython-310.so`progress_callback+0x46()
fffffc7fffdf0880 pycurl.cpython-310.so`util_curl_xdecref+0x87()
    This used to (before ddde588):
        self->multi_stack = NULL;
    before calling:
        curl_multi_remove_handle()
    but it no-longer does, causing pycurl_get_thread_sate() to enter the multi_stack cleanup block, where it did not before.

fffffc7fffdf08b0 pycurl.cpython-310.so`util_curl_close+0x78()
    handle = self->handle;
    self->handle = NULL;
    ...
    util_curl_xdecref(self, PYCURL_MEMGROUP_MULTI, handle);

This (^) is being called by the garbage collector at process exit, see below. I can't make my simple test case call this code path, but the failing application does it every time.

fffffc7fffdf08d0 pycurl.cpython-310.so`do_curl_dealloc+0x45()
fffffc7fffdf0970 pycurl.cpython-310.so`do_multi_clear+0x9d()
fffffc7fffdf0af0 libpython3.10.so.1.0`gc_collect_main+0x9d6()
fffffc7fffdf0b10 libpython3.10.so.1.0`_PyGC_CollectNoFail+0x3e()
fffffc7fffdf0ba0 libpython3.10.so.1.0`finalize_modules+0x4ef()
fffffc7fffdf0be0 libpython3.10.so.1.0`Py_FinalizeEx+0x90()
fffffc7fffdf0c00 libpython3.10.so.1.0`Py_Exit+0x10()
kulikjak commented 1 year ago

Hi, we've hit this as well (apparently in the same application on Oracle Solaris). @citrus-it showed the behavior in the trace above, but here are my notes with some new info from my digging.

In our case, the issue arises in progress_callback function (as can be seen in the trace above) because the application is setting hdl.setopt(pycurl.PROGRESSFUNCTION, ...); when I remove it, the test passes again. That said, I believe that other callbacks are affected as well.

Here is what is happening: util_curl_close is called and self->handle is set to NULL;util_curl_xdecref(self, PYCURL_MEMGROUP_MULTI, handle) is then called and it gets to the following code block:

    PYCURL_BEGIN_ALLOW_THREADS_EASY
    (void) curl_multi_remove_handle(multi_stack->multi_handle, handle);
    PYCURL_END_ALLOW_THREADS_EASY

at this point, self->handle is NULL, self->multi_stack is not yet NULL and since threads are allowed, progress_callback can be called inside the ALLOW_THREADS_EASY block. This gets all the way to pycurl_get_thread_state and:

    if (self->multi_stack != NULL && self->multi_stack->state != NULL)
    {
        /* inside multi_perform() */
        assert(self->handle != NULL);
        assert(self->multi_stack->multi_handle != NULL);
        assert(self->state == NULL);
        return self->multi_stack->state;
    }

where the condition still passes and the assertion fails.

When I move self->multi_stack = NULL; above PYCURL_BEGIN_ALLOW_THREADS_EASY, it still fails only in the other pycurl_get_thread_state branch because it doesn't change anything about self->handle being NULL.

When I remove PYCURL_BEGIN_ALLOW_THREADS_EASY, the issue is gone as no callbacks are allowed any longer. I believe that unless the handle is still available, util_curl_xdecref should not allow threads if called from util_curl_close (there should be no issue when called from do_curl_clear, though).

kulikjak commented 1 year ago

As for the possible fix, I tried the following three patches:

I also ran PycURL test suite and the first two changes caused multi_callback_test.py::MultiCallbackTest::test_easy_close failure. The last one doesn't seem to introduce new issues, but I don't know what other issues it might cause (and it seems like the biggest change out of those three).

All that said, I have only very limited knowledge of PycURL (or cURL), so I might be missing important details.

p commented 1 year ago

Thank you very much for your research @citrus-it.