Open citrus-it opened 2 years ago
Yeah, a reproducer would be very helpful. If not, you could try bisecting to see which commit introduced the problem.
@fsbs any ideas here?
Yeah, a reproducer would be very helpful. If not, you could try bisecting to see which commit introduced the problem.
I'm having trouble writing something standalone to trigger it, the application is pretty big. I'll bisect and report back.
ddde58832187211e68e04f16a08342b155baab51
ddde58832187211e68e04f16a08342b155baab51 is the first bad commit
commit ddde58832187211e68e04f16a08342b155baab51
Author: fsbs <fsbs@users.noreply.github.com>
Date: Tue Oct 12 14:18:23 2021 +0000
Set easy->multi_stack->state for multi's callbacks
This ensures multi's callbacks invoked by some easy functions have
access to thread state.
Fixes callbacks invoked by:
* do_curl_pause
* util_curl_xdecref
Thanks. That was my suspicion. Hopefully @fsbs will have a chance to look into this.
Sorry, still no luck with a reproducer, I'll have another stab at it tonight.
@citrus-it did you ever get a reproducer for this?
Unfortunately not, I'll give it another go this week. Our test suite triggers it every time, but it's a large project so hard to isolate.
I have a smaller reproducer now, still using a large part of the local application library so I will keep iterating.
I've been unable to make a standalone reproducer for this, the small-ish script I have still pulls in a lot of modules from the application.
However, I can see where the bug is from reading through the code.
Here's the relevant piece of the stack trace from the core file, most recent frame first (so read it backwards), with some in-line annotations.
fffffc7fffdf0520 pycurl.cpython-310.so`pycurl_get_thread_state+0x187()
if (self->multi_stack != NULL && self->multi_stack->state != NULL)
{
/* inside multi_perform() */
assert(self->handle != NULL); <--- assertion fails
fffffc7fffdf0540 pycurl.cpython-310.so`pycurl_acquire_thread+0x11()
fffffc7fffdf05a0 pycurl.cpython-310.so`progress_callback+0x46()
fffffc7fffdf0880 pycurl.cpython-310.so`util_curl_xdecref+0x87()
This used to (before ddde588):
self->multi_stack = NULL;
before calling:
curl_multi_remove_handle()
but it no-longer does, causing pycurl_get_thread_sate() to enter the multi_stack cleanup block, where it did not before.
fffffc7fffdf08b0 pycurl.cpython-310.so`util_curl_close+0x78()
handle = self->handle;
self->handle = NULL;
...
util_curl_xdecref(self, PYCURL_MEMGROUP_MULTI, handle);
This (^) is being called by the garbage collector at process exit, see below. I can't make my simple test case call this code path, but the failing application does it every time.
fffffc7fffdf08d0 pycurl.cpython-310.so`do_curl_dealloc+0x45()
fffffc7fffdf0970 pycurl.cpython-310.so`do_multi_clear+0x9d()
fffffc7fffdf0af0 libpython3.10.so.1.0`gc_collect_main+0x9d6()
fffffc7fffdf0b10 libpython3.10.so.1.0`_PyGC_CollectNoFail+0x3e()
fffffc7fffdf0ba0 libpython3.10.so.1.0`finalize_modules+0x4ef()
fffffc7fffdf0be0 libpython3.10.so.1.0`Py_FinalizeEx+0x90()
fffffc7fffdf0c00 libpython3.10.so.1.0`Py_Exit+0x10()
Hi, we've hit this as well (apparently in the same application on Oracle Solaris). @citrus-it showed the behavior in the trace above, but here are my notes with some new info from my digging.
In our case, the issue arises in progress_callback
function (as can be seen in the trace above) because the application is setting hdl.setopt(pycurl.PROGRESSFUNCTION, ...)
; when I remove it, the test passes again. That said, I believe that other callbacks are affected as well.
Here is what is happening:
util_curl_close
is called and self->handle
is set to NULL;
→ util_curl_xdecref(self, PYCURL_MEMGROUP_MULTI, handle)
is then called and it gets to the following code block:
PYCURL_BEGIN_ALLOW_THREADS_EASY
(void) curl_multi_remove_handle(multi_stack->multi_handle, handle);
PYCURL_END_ALLOW_THREADS_EASY
at this point, self->handle
is NULL
, self->multi_stack
is not yet NULL
and since threads are allowed, progress_callback
can be called inside the ALLOW_THREADS_EASY
block. This gets all the way to pycurl_get_thread_state
and:
if (self->multi_stack != NULL && self->multi_stack->state != NULL)
{
/* inside multi_perform() */
assert(self->handle != NULL);
assert(self->multi_stack->multi_handle != NULL);
assert(self->state == NULL);
return self->multi_stack->state;
}
where the condition still passes and the assertion fails.
When I move self->multi_stack = NULL;
above PYCURL_BEGIN_ALLOW_THREADS_EASY
, it still fails only in the other pycurl_get_thread_state
branch because it doesn't change anything about self->handle
being NULL
.
When I remove PYCURL_BEGIN_ALLOW_THREADS_EASY
, the issue is gone as no callbacks are allowed any longer. I believe that unless the handle is still available, util_curl_xdecref
should not allow threads if called from util_curl_close
(there should be no issue when called from do_curl_clear
, though).
As for the possible fix, I tried the following three patches:
--- pycurl-7.45.2/src/easy.c
+++ pycurl-7.45.2/src/easy.c
@@ -469,10 +469,7 @@ util_curl_xdecref(CurlObject *self, int
from the multi object's easy_object_dict, but this
requires us to have a reference to the multi object
which right now we don't. */
self->handle
is available
--- pycurl-7.45.2/src/easy.c
+++ pycurl-7.45.2/src/easy.c
@@ -470,9 +470,13 @@ util_curl_xdecref(CurlObject *self, int
requires us to have a reference to the multi object
which right now we don't. */
/* Allow threads because callbacks can be invoked */
self->handle
is still available when util_curl_xdecref(self, PYCURL_MEMGROUP_MULTI, handle)
is called
--- pycurl-7.45.2/src/easy.c
+++ pycurl-7.45.2/src/easy.c
@@ -567,7 +567,6 @@ util_curl_close(CurlObject *self)
assert(self != NULL);
assert(PyObject_IsInstance((PyObject *) self, (PyObject *) p_Curl_Type) == 1);
handle = self->handle;
self->handle = NULL; if (handle == NULL) { /* Some paranoia assertions just to make sure the object
deallocation problem is finally really fixed... / @@ -584,6 +583,7 @@ util_curl_close(CurlObject self)
/ Decref multi stuff which uses this handle / util_curl_xdecref(self, PYCURL_MEMGROUP_MULTI, handle);
All three seem to have fixed the original issue we saw in the application.
I also ran PycURL test suite and the first two changes caused multi_callback_test.py::MultiCallbackTest::test_easy_close
failure. The last one doesn't seem to introduce new issues, but I don't know what other issues it might cause (and it seems like the biggest change out of those three).
All that said, I have only very limited knowledge of PycURL (or cURL), so I might be missing important details.
Thank you very much for your research @citrus-it.
Hi,
I have upgraded pycurl from 7.44.0 to 7.45.1 and am seeing the above assertion fail when running the testsuite for a large python application that uses pycurl. I am still investigating and trying to write a small reproducer, and I'll attach one once I can, but I'm opening this in case it's immediately obvious to the maintainers or if anyone else is seeing it since upgrading.
What happened?
When running the testsuite with
pycurl.verbose
, this is what I see:The stack trace from the core file is:
so it seems to be being triggered by garbage collection.
What is the PycURL version?
'PycURL/7.45.1 libcurl/7.82.0 OpenSSL/3.0.1 zlib/1.2.11 brotli/1.0.9 zstd/1.5.2 nghttp2/1.47.0'
What is your Python version?
Python 3.10.2
What is your operating system and its version?
OmniOS v11 r151041 SunOS bloody 5.11 omnios-master-57a305f45c i86pc i386 i86pc