Closed neonrust closed 1 year ago
This could've started happening after upgrading Ubuntu from 21.10 to 22.04, which I did between when I know it last it worked and now. Will try spinning up a VM with 21.10 and try it there
update: for some reason, my attempts to install Ubuntu 21.10 server in a VM has failed. Installer fails and spits out a huge report (which I'm way too lazy to read). Anyway, in 21.10, the libcurl version is 7.74 and in 22.04 it is 7.81.
Hey, thanks for reporting this issue. I think the right thing to start with would be adding Ubuntu 22.04 to the ci.yml so we can reproduce it here and it'll be tested moving forward. Edit: we can also add 21.10
I'll try and dig through the changeling from all the new libcurl versions, maybe there is something obvious that changed.
Of course, lift is awesome! I especially like the event loop thinking which seems kind of unique as far as http client libraries go, from what I've found.
I did a "manual bisect" with different versions of libcurl and it's actually only 7.81 that segfaults.
I tested these versions:
Btw, the "works" output looks like this:
Requesting http://www.example.com
Requesting http://www.google.com
Requesting http://www.reddit.com
Completed http://www.google.com ms:169
Completed http://www.example.com ms:240
Error: http://www.reddit.com : timeout
Not sure why reddit is reported as timed out. It is, however, the only one of those URLs that return 302. Anyway, that's a completely different issue!
Nice, going to take a look at the 7.81.0 release notes, good find!
7.82 has several multi-handle fixes but the behavior around recursively calling has certainly been changed/adjusted since I originally wrote the check socket actions in the timer callback function. Its clear that they don't think it should be supported any longer so that is a major issue.
https://github.com/curl/curl/issues/8282 <- curl maintainer saying he won't revert removing the protection for recursivly calling the multi* functions, although its pretty funny the examples still had them recursively calling as of this post :joy:
https://github.com/curl/curl/issues/8480 <- curl maintainer allowing 1 multi fn to be recursively called, so this probably fixed the issue which is why only 7.81.0 currently shows the probem
I don't think we'll be able to fix this in lift specifically for 7.81.0 though since that libcurl version has a bigger underlying issue. What maybe should get fixed is trying not to call socket actions recursively from the timer function?
Nice findings. Yeah, fixing it for just that version feels icky. But also, considering their general stance of recursion, lift might become broken again. I suppose, the workaround for now (for me) is just not use 7.81.. :)
I'm going to open a PR to add ubuntu 22.04 to the CI, but as you found out 7.81.0 is the default libcurl version. I'll add a note to the base readme that that version of curl has problems.
The docs on the CURLM_TIMERFUNCTION
are not entirely clear what to do when timeout_ms=0
, its like a weird limbo value that isn't supposed to be handled? The docs point out what to do for -1
and values >0
but not what to do when it is zero. If my memory serves 0
basically means the timer has expired and things need to be handled, so if I don't call the socket action when timeout_ms=0
then we'd probably see connections that hang until some other action happens... and I vaguely remember that being the case testing this years ago.
@neonrust mind taking a review over adding 22.04 to CI?
Hm... I approved it, I think, but it still says "1 approving review required". Not sure if I ever used the review stuff here, or a long time ago.
Yeah, the settings for merging seem wrong, it seems to think only code contributors can approve it. All good though, I can override it. Thanks for taking a look, its merged now, I'll make another release.
Nice.
I'm almost 100% that the exact same version of lift (it hasn't been modified in a while after all) did work earlier this year, so it might be something that has changed with libcurl. Its changelog during that time is quite extensive so it will take a while to sift through...
Running the
lift_async_simple
example now segfaults.The top of the stack trace:
curl_start_timeout
(set inlift::client
ctor) gets called with atimeout_ms
of zero, which I understand is normal.client::check_actions()
which just unconditionally callscheck_actions(CURL_SOCKET_TIMEOUT, 0)
, whereCURL_SOCKET_TIMEOUT
is -1.curl_code = curl_multi_socket_action(m_cmh, socket, event_bitmask, &running_handles)
, which segfaults.I did notice, however, that the docs for CURLMOPT_TIMERFUNCTION does state:
Which lift seems to be doing by calling
client::check_actions()
which eventually callscurl_multi_socket_action(...)
.And... is this still being maintained in some capacity?