taf2 / curb

Ruby bindings for libcurl
Other
1.29k stars 230 forks source link

curl-multi throughput significantly lower in newer versions #24

Closed igrigorik closed 14 years ago

igrigorik commented 14 years ago

Running 0.4.6.0, upgrading to 0.6.x shows almost a 2x throughput drop - all else being equal. Lower cpu usage, lower network throughput.

ex: http://skitch.com/igrigorik/nuyjw/network-report

taf2 commented 14 years ago

Please try again with 0.6.6, setting the new attribute Curl::Multi.default_timeout = 1

igrigorik commented 14 years ago

Curl version: * $Id: curl.h,v 1.379 2009-03-02 23:05:31 bagder Exp $

Default timeout appears to be set at 100ms. Applied the patch, set timeout to 1ms, running now - will take a bit of time to get a better picture. Just eyeballing the last minute of stats though, it does seem that it's going faster, but still doesn't match the 0.4.6.x branches.

Will try reverting back a few commits to see where the problem is introduced:

http://github.com/taf2/curb/commit/1be75e3dee822e5e760d21468d1d7b3b0472fcf4  -> use an internal Hash instead of static variables to store ruby objects... the thought here was it should speed up Curl::Easy.new -> yet to be proven though..
http://github.com/taf2/curb/commit/4bdc81576ae7483f130a4eec17c73373927614fa -> timeout refactored

Other commits that may have impacted performance:
http://github.com/taf2/curb/commit/35cb65b03f7b386fe6c7f7068b7ac7622c6e1820
http://github.com/taf2/curb/commit/6af516cb31909610f4e6b9f4302aeb49bfaeed49

This one maybe if you're not running the loop through a multiple handle and not running multiple threads... http://github.com/taf2/curb/commit/d0944bca9a573fa8bbe2f7691d421516738a25bb
igrigorik commented 14 years ago

Nope, must be something else - throughput stayed the same after applying the timeout patch.

igrigorik commented 14 years ago

Ok, ran bisect on it, and it's definitely the "cleanup timeout commit":

4bdc81576ae7483f130a4eec17c73373927614fa is first bad commit
commit 4bdc81576ae7483f130a4eec17c73373927614fa
Author: Todd Fisher 
Date:   Sun Jul 26 16:31:52 2009 -0400

    cleanup timeout code in multi select loop

:040000 040000 449d377da6cad320dca006253e2318ffb5e792e7 a7b14dc5524c6dfb6577df3214aabbb97ce69665 M  ext

bisect run:

git bisect start
# bad: [ac0b465392429caaaec7ca2eadf98fe2c05603cb] ready for release
git bisect bad ac0b465392429caaaec7ca2eadf98fe2c05603cb
# good: [6c133998302ea9ad0c5146880cbd7fa8585edf6e] push out new gem release
git bisect good 6c133998302ea9ad0c5146880cbd7fa8585edf6e
# bad: [1745d46fcf1e5c55276be558b64ed0644ed612a2] compat with non GNU strip binaries
git bisect bad 1745d46fcf1e5c55276be558b64ed0644ed612a2
# bad: [a46607c037576bcca3c782ee32159f9dd46f5e10] update other request types to yield once
git bisect bad a46607c037576bcca3c782ee32159f9dd46f5e10
# bad: [4bdc81576ae7483f130a4eec17c73373927614fa] cleanup timeout code in multi select loop
git bisect bad 4bdc81576ae7483f130a4eec17c73373927614fa
# good: [c351e53e30eff1eae119fbd79d54b37da64930cc] make sure to set CURLOPT_NOBODY as reported by  copiousfreetime
git bisect good c351e53e30eff1eae119fbd79d54b37da64930cc
# good: [ab433db38df1e86cfc0261796a96fe1cb3cde209] add another test case for 500 responses
git bisect good ab433db38df1e86cfc0261796a96fe1cb3cde209
# good: [4233fcc6f65f1107693014038da9320cbdee5c5e] set upload object to Qnil on cleanup
git bisect good 4233fcc6f65f1107693014038da9320cbdee5c5e
# good: [50bdc2beed1c89c903145c05cc00344b9a7ac9da] add inspect method
git bisect good 50bdc2beed1c89c903145c05cc00344b9a7ac9da

Latest patches did not fix (tried setting default_timeout to 1ms, same results).

taf2 commented 14 years ago

I know it's been a while, but wondering if you can try with this last commit. http://github.com/taf2/curb/commit/f001b0ead7d7df1f2289087e8527d4d1183a4b6d

igrigorik commented 14 years ago

Hmm, nope: http://skitch.com/igrigorik/dnxc4/ganglia-comments-keystone-i-140d0e7c-host-report

:-(

ramsingla commented 14 years ago

taf2 I have made some changes to my fork for the Curl::Multi These changes have made the running of tests significantly faster. Almost 2-3 seconds to run all the units tests, which used to take 8-9 seconds.

You can have a look at this patch.

http://github.com/ramsingla/curb/commit/d56853bd435c71f4a68a1557bf57d3a1b96a5698

There are couple of things regarding the timeout I had change that could be kept the same. I am sending you the pull request.

taf2 commented 14 years ago

Thanks! I'll review tonight

ramsingla commented 14 years ago

Also this patch provides the issue with adding more requests to the same multi handler on_complete callbacks dynamically. Often it was observed that the requests where not performed. This is important in the application when some serialization is required amongst some requests. In these cases, the perform call had to be called again for the requests which imho is not desired behavior.

taf2 commented 14 years ago

Okay, after applying the patch my tests went from taking 4.2 seconds to 5.2 seconds... not sure test speed in this case is a good measure... but what I do see is we're not relying on libcurl to tell us how long to wait... that does make the loop much simpler. I like the additional do while loop to make the calling ruby code simpler. I imagine that even remains backwards compatible with the previous behavior... I made one minor formatting change, will spin a new release with this code. Maybe igrigorik could run his tests again to see if this improves performance?

ramsingla commented 14 years ago

Yes you are right. I was experimenting first with the timeout. But the real change was the additional do while loop and making the rb_curl_multi_run not to read on every all. But regarding the performance, it might be better if the libcurl timeout can be introduced back. On Max OS X i saw 3 to 4 fold improvement but it might be because curl on my system did not supported the libcurl timeout.

igrigorik commented 14 years ago

Hmm, nope :-(

http://skitch.com/igrigorik/de9m5/photoshop

ramsingla commented 14 years ago

@igrigorik: is there any improvement? from the charts it seems the cpu usage has gone low and throughtput is slightly better than before. are you using libcurl version less than 7.19.x? @taf2: libcurl timeout could make the difference. was it there in 0.4.6.0?

igrigorik commented 14 years ago
curl 7.19.4 (x86_64-unknown-linux-gnu) libcurl/7.19.4 OpenSSL/0.9.8k zlib/1.2.3 c-ares/1.6.0
Protocols: tftp ftp telnet dict http file https ftps 
Features: AsynchDNS Largefile NTLM SSL libz

The new gem is the valley.. the download speed drops to about half.

taf2 commented 14 years ago

igrigorik, I created this script to identify some perf issues: http://gist.github.com/446279 On my mac, I identified that openssl is causing creation of Curl::Easy handles to take a very long time, the offending call is in curl_easy_init - running that script will identify the issue... I don't think you have that issue - but maybe see?

taf2 commented 14 years ago

Okay, still sort of shooting in the dark, but this commit has had a slight improvement in test run time for me... http://github.com/taf2/curb/commit/620ca1669ad41c3ad9e27434788e1c84740655f7 shaved a second off...

taf2 commented 14 years ago

I added some benchmarks based on the persistent connection tests brought to attention by drbrain - i should add em-http to that as well?

ramsingla commented 14 years ago

interesting thing about the chart from the new version is that the CPU utilization decreases. I will have close have close look at 0.4.6 code. My only guess is that Curl::Easy instance in newer code uses Curl::Multi Could that make the difference? @igrigorik are you using Curl::Multi or Curl::Easy?

ramsingla commented 14 years ago

@taf2: please look at the difference of multi perfom function in both versions http://gist.github.com/446415 imho, igrigorik you might be adding more to curl multi using a block with perform call. block call is called in every iteration in the old version and in newer version only if the rb_thread_select returns 0. so it might be that more easy handles are being added on the fly to same multi handler by igrigorik's script and that i can relate with the CPU usage difference. Can this be the source of performance slowdown?

igrigorik commented 14 years ago

Hey guys, sorry didn't get a chance to test this yet.. but I like your last theory, makes sense, conceptually.

ramsingla commented 14 years ago

igrigorik, if this makes sense conceptually then it might be that you need to alter the implementation of the code somehow to make it better with newer version, which is more efficient ( Ideally the block call should be made only if the select returns 0 ). how many urls are you adding to Curl::Multi in a go? Can you double the number and see if the performance with the newer version is better or worse? Also could you add new urls with on_complete callbacks?

taf2 commented 14 years ago

Let's try again with this change maybe? I remember removing the yield block believing it made sense to focus the loop more on immediate network activity - but rb_thread_select in favor of code in the perform block might be the wrong decision... if there is code in the perform block it should probably run in favor of other ruby threads... Here's the commit: http://github.com/taf2/curb/commit/ac07429c7a043f43b2e058a4b5be9f9559eaac42

igrigorik commented 14 years ago

Awesome. Ram, your theory is right on the money. With the latest patch the perf is definitely back into the same +-/10% range. I also converted the code to queue up the next easy handle in the on_success handler instead of the perform, and that did seem to help a bit as well. Everything combined, we're back on the same throughput numbers. Incidentally, the CPU usage has remained the same, no changes.

I think we can finally close this guy! Although.. I do wonder how the docs can/should be updated to reflect this learning, as I'm sure others may run into the same issue (even worse, without even knowing it).

taf2 commented 14 years ago

Alright, I updated the README files. Closing this one out. I'll work on a new release soon.

ramsingla commented 14 years ago

cheers, finally we got this thing right. igrigorik if only on_success callback is use to add more to the queue then it should be low on cpu. ( when one job finishes add another ). Otherwise, with the latest patch you can yield aggressively by providing block to the perform call ( i think this should be high on cpu ).