twitter-archive / kestrel

simple, distributed message queue system (inactive)
http://twitter.github.io/kestrel
Other
2.77k stars 312 forks source link

Slowdown due to receiveBufferSize #40

Closed jeffstyr closed 13 years ago

jeffstyr commented 13 years ago

Testing locally, if I run sbt "put-many -b 2036 -n 1000 -c 1", throughput slows to a crawl after several requests. Specifically, the first several puts complete quickly, and then all subsequent put take a bit over 5 seconds to complete. (It varies from run-to-run, but slowdown starts at either 32, 128, or 255 puts.) It happens with a higher client count too, it's just easier to debug with "-c 1".

Adding some logging to the test code, I narrowed this down to the client-side blocking on reading the response back from kestrel.

Using tcpdump and dtruss, I could see that the response is being send immediately, but the underlying read() call on the client side is blocking for about 5 seconds.

If I comment out the bootstrap.setOption("child.receiveBufferSize", 2048) line in Kestrel.scala, the problem goes away.

This also seems to be the cause of the "huge message" unit test failure in grabby-hands (see issue 4); commenting out that line fixes this as well.

(I'm running on Mac OS X 10.6.7)

I don't completely understand why setting a small buffer size is causing this behavior, but it seems to be.

robey commented 13 years ago

wow, thanks for the detailed info!

i don't remember anymore why i thought the receive buffer size was important -- maybe i was trying to cut down on memory use when there are tens of thousands of clients (which is how we normally run). i'll remove that line immediately.

robey commented 13 years ago

performance looks good -- committing!