Why one connection per request?

thejh commented 12 years ago

I've looked at the network traffic of a nodejs program I'm currently working on, and it immediately closes couchdb HTTP connections after getting the response (although it sends a Connection: keep-alive header, which IMO doesn't fit together with that). Is there a way to turn on connection pooling or so? I think I didn't change any of the related settings.

[jann@Jann-PC node-forum master]$ node -v
v0.6.6
[jann@Jann-PC node-forum master]$ npm ls
forum@0.1.0 /home/jann/gitty/node-forum
├─┬ couchapp@0.9.0 
│ ├── request@2.2.9 
│ └── watch@0.5.0 
├── hat@0.0.3 
├── marked@0.1.4 
├── password-hash@1.1.3 
├─┬ ram-static@0.1.0 
│ ├── hat@0.0.3 
│ └── mime@1.2.4 
├─┬ relax@0.1.1 
│ └── request@2.2.9 
└── vacuum@0.1.1 -> /home/jann/gitty/node-vacuum

mikeal commented 12 years ago

Connection pooling is handled by node core.

The default pool size is 5 connections. If you call request enough times to exhaust the pool it will reuse the connections. If there are no pending connections it will close the one that is open.

On Dec 26, 2011, at December 26, 20113:22 PM, Jann Horn wrote:

I've looked at the network traffic of a nodejs program I'm currently working on, and it immediately closes couchdb HTTP connections after getting the response (although it sends a Connection: keep-alive header, which IMO doesn't fit together with that). Is there a way to turn on connection pooling or so? I think I didn't change any of the related settings.
[jann@Jann-PC node-forum master]$ node -v
v0.6.6
[jann@Jann-PC node-forum master]$ npm ls
forum@0.1.0 /home/jann/gitty/node-forum
├─┬ couchapp@0.9.0 
│ ├── request@2.2.9 
│ └── watch@0.5.0 
├── hat@0.0.3 
├── marked@0.1.4 
├── password-hash@1.1.3 
├─┬ ram-static@0.1.0 
│ ├── hat@0.0.3 
│ └── mime@1.2.4 
├─┬ relax@0.1.1 
│ └── request@2.2.9 
└── vacuum@0.1.1 -> /home/jann/gitty/node-vacuum
Reply to this email directly or view it on GitHub: https://github.com/mikeal/request/issues/142

thejh commented 12 years ago

So it only keeps connections open as long as they're used all the time? And if I don't use a connection for one millisecond, it closes?

mikeal commented 12 years ago

it's not based on milliseconds, when the response finishes it checks for pending requests and if there aren't any it closes the socket. connections are only reused once you exhaust the pool (create more than 5 request to one host before any have finished)

On Dec 26, 2011, at 6:56 PM, Jann Hornreply@reply.github.com wrote:

So it only keeps connections open as long as they're used all the time? And if I don't use a connection for one millisecond, it closes?

Reply to this email directly or view it on GitHub: https://github.com/mikeal/request/issues/142#issuecomment-3278299

thejh commented 12 years ago

No way to change this? I'd assume that this causes requests to take longer.

http://en.wikipedia.org/wiki/Slow-start

The slow-start protocol performs badly for short-lived connections. Older web browsers would create many consecutive short-lived connections to the web server, and would open and close the connection for each file requested. This kept most connections in the slow start mode, which resulted in poor response time. To avoid this problem, modern browsers either open multiple connections simultaneously or reuse one connection for all files requested from a particular web server.

thejh commented 12 years ago

Hmm, should this be a request issue or a node core issue?

mikeal commented 12 years ago

it's a core issue, but either way it still needs to go to me since i wrote the pooling code in core :)

the reason we don't hold sockets open for an arbitrary amount of time is that it's just unreliable. we used to do things that way but it's hard because you get unexpected disconnects in between connections which cause errors that can't be handled because they don't have a request handler.

the current pooling code is actually ideal. it's reliable because we don't hold connections we don't need and we will pool the connections under load. if you aren't under enough load to exhause 5 connections then a round trip for a new connection is not going to be a big deal.

you can adjust the pool size, making it larger or smaller, and you can also remove sockets from the pool if you want to keep one connection open indefinitely for something like the CouchDB _changes feed.

-Mikeal

On Dec 26, 2011, at December 26, 20118:10 PM, Jann Horn wrote:

Hmm, should this be a request issue or a node core issue?

Reply to this email directly or view it on GitHub: https://github.com/mikeal/request/issues/142#issuecomment-3278592

thejh commented 12 years ago

I'm not concerned about load, but about latency. When I have to make three requests to the database in serial, I think it makes some difference whether I have three round-trips for TCP handshakes and slow-start connections or a bunch of "warm" connections. Mmhm... Why are disconnects in between a problem? If a connection dies in between, why should anyone care? Just drop it and make a new one when needed... I guess it's actually harder than I think, but I don't see the problem.

thejh commented 12 years ago

Ah, the problem occurs when a connection has died, but you don't know it, right?

thejh commented 12 years ago

Soooo, I made a patch and benchmarked: https://github.com/thejh/node/compare/joyent:v0.6...agent-pooling

ab -n 1000 -c 1 http://localhost:9876/thread/a5488f445cc5765be0278bf33e7e93fb/1 output without and with patch: https://gist.github.com/1523945

So, basically, ~11% less time per request to my webapp for one request at a time.

Test setup: ab and the node server on my computer, couchdb in the cloud (~122ms TCP-SYN-ping latency).

dscape commented 12 years ago

@thejh did you consider the fact that in a real life scenario you probably will always have a queue? And if you don't probably performance doesn't matter cause you are not really pushing the limits of even a single server.

Yes this matters but mostly if this result is in a case where the queue exists for the pooling. Not saying it doesn't matter in other things too, but that should be the main concern. Also measuring bottlenecks normally means you have them, where you having any significant performance implications that crippled your application?

Then again if you could fix core to make this default and not have any issues with the same performance characteristics and no extra bugs that would be grand, but silver bullets are normally better in a powerpoint slide :)

thejh commented 12 years ago

@dscape No, I think you're wrong. I won't always have a queue. If I have a queue, that means that I'm trying to send so many HTTP requests that I can't actually do it that fast. If I have a growing queue, latency gets bigger and bigger and soon I'll have to wait an hour for my requests to complete. We shouldn't optimize for the scenario that we're in that state for a long time, we should try to get out of it as soon as possible. And if we have a constant-size queue, the server is working at maximum DB connections and barely able to handle the traffic although it's nearly idle. So, long story short: If you have code that bursts out requests from time to time, a queue might be useful to limit the load on the other system and to reduce overhead, but for something like the connection to your couchdb, you don't want a queue, you want a connection pool that's large enough to do all requests immediately.

(edited)

thejh commented 12 years ago

@mikeal How does that patch look? Any obvious issues? If not, I'll do a pull request.

thejh commented 12 years ago

@dscape Also, about this:

And if you don't probably performance doesn't matter cause you are not really pushing the limits of even a single server.

Even if you only have one user and are nearly idle, latency will still matter for that user. If there are two more round trips to the DB, the page load time will be two database round trips higher.

mikeal commented 12 years ago

there are a dozen edge cases you aren't considering that i know are a problem because node used to reuse the socket objects.

what happens when you assign a socket but before sending any data it closes? at this point it's actually assigned and any disconnects in the middle of sending or parsing are considered HTTP errors.

disconnects that aren't clean (which a lot of proxies love to do) will always throw on the socket. when the socket isn't assigned this will trigger the global uncaught exception handler. handling that, but also not having an error handling so that the user gets the right behavior when assigned is almost impossible, eventually you'll swallow exceptions.

there are other cases too, mainly because when the socket object is unused and it's assigned you have a full round trip before you know if something was improperly disconnected. a lot of stuff can happen to the request in that time and it can be in a few different states all of which can't tell the difference between a disconnect from the server because you may have sent something bad and a disconnect just because it was open a little too long and the remote end killed it.

you're obsessing about the wrong problem. yes, you're absolutely correct, it's 11% slower, and in absolute time that's probably 10ms, which you can live with when you aren't under any real load. that only starts to matter when you're piling up a ton of pending requests because of the latency, which won't happen because when you're under load you'll fill up the pool.

@mranney was concerned about a lot of these same issues when moving to the new pool and the last time i talked to him he was pretty happy with the increase in reliability.

mikeal commented 12 years ago

also, you're missing nuno's point. if you're serving 5 clients in parallel you'll have exhausted the pool, so you'll actually end up reusing plenty of connections even tho the code path for each user makes sequential requests. if you aren't serving more than 5 users concurrently then you really don't need to be worried about 10ms :)

mranney commented 12 years ago

We are a very heavy user of node's HTTP client. The new keepalive behavior in 0.6 is indeed slower than it was in node 0.4, but things are vastly more stable in 0.6. In 0.4, we had to float a series of incredibly nasty patches to work around a series of issues. In 0.6, we have not yet needed any custom patches. This is good.

That said, I would love to get more performance out of node's HTTP client by re-using connections. I know the issues are complicated, and I'm glad that we've made things stable first. Now that they are stable, it's time to start thinking about making them fast.

For HTTP client connections over the open Internet, this keepalive issue is probably not that important. In fact, for many applications, opening a new TCP for every HTTP might actually be faster and better for everybody, depending on how the server backend is load balanced.

For HTTPS client applications that consume an API like Facebook's or Twitter's, the current keepalive performance is pretty awful. Negotiating new TLS sessions is very expensive for both sides, so avoiding that is an obvious and gigantic win.

For HTTP client connections across the datacenter to a small set of the same machines every time, like for CouchDB / Riak or for node to node operations, the performance wins are less dramatic, but at scale they still add up to a problem worth solving.

mikeal commented 12 years ago

i'm going to be -1 on keeping connections open for any amount of time longer than one cycle through the event loop

no matter how you shake it we'll have this case where a socket gets assigned and gets disconnected because it had been left open and not because of what was trying to be sent down it. it means that node, by default, will have these random HTTP errors totally unrelated to your application code that you only see in production.

that said, what we could do is find a way to keep the connection available for any new requests before nextTick(). this would handle the sequential requests case @thejh is concerned with and wouldn't keep connections open any longer than they currently are (remember that calling close() still won't take effect until after nextTick() anyway).

what do you think @mranney?

thejh commented 12 years ago

disconnects that aren't clean (which a lot of proxies love to do) will always throw on the socket. when the socket isn't assigned this will trigger the global uncaught exception handler. handling that, but also not having an error handling so that the user gets the right behavior when assigned is almost impossible, eventually you'll swallow exceptions.

What about the thing I did, register an error listener to the socket that, as long as the socket is idle, just destroys it on error? You can unbind it in exacly the moment in which you give the socket to the next HTTP client.

there are other cases too, mainly because when the socket object is unused and it's assigned you have a full round trip before you know if something was improperly disconnected. a lot of stuff can happen to the request in that time and it can be in a few different states all of which can't tell the difference between a disconnect from the server because you may have sent something bad and a disconnect just because it was open a little too long and the remote end killed it.

Hmm, ok... so, for stuff that alters state, this probably means that reusing connections is a bad idea... but well, much traffic doesn't alter state (GET and HEAD). And for those two, we also buffer all the data we want to send (only a bunch of headers) in RAM anyway. So, could we just retry on connection close before answer once in case we're doing a GET or HEAD request and not use the pooling for POST and all the other stuff?

mikeal commented 12 years ago

@thejh retrying on error could only ever work for GET and HEAD, for PUT and POST you won't have the data you wrote.

also, how do you distinguish between an unclean disconnect due to the remote end actually not handling something and the one due to the connection being open? you cant :)

if you retry on socket error and there is a legitimate parser or server error on the remote end that causes a disconnection you'll just start DOSing the server with retrys.

basically, keeping a min socket will lead to unreliable behavior at scale. there's not way to get around it. now, if you are confortable with this level or reliability you could write an alternate Agent implementation and pass it to http.request. it can't be default because core needs to be reliable, but if this behavior is desirable for you then you can write an alternate agent.

i think the best solution is, as I stated above, to reuse connections after "end" by waiting until nextTick() to call "close". more accurately, i think we should setTimeout(cb, 0) so that we call close after all nextTick() callbacks have fired. that way, any kind of sequential http requests you want to do will hold on to the same connection.

thejh commented 12 years ago

@mikeal

retrying on error could only ever work for GET and HEAD, for PUT and POST you won't have the data you wrote.

That's what I was trying to say.

also, how do you distinguish between an unclean disconnect due to the remote end actually not handling something and the one due to the connection being open? you cant :)

Right, we'll just have to retry.

if you retry on socket error and there is a legitimate parser or server error on the remote end that causes a disconnection you'll just start DOSing the server with retrys.

Well, we can just do the retry on a new socket - an old one is gone anyway (it was closed and is the reason why we have to retry), so we can open a new one. If it fails again, we know it's because of the data and we can pass that error back to our callback. I wouldn't call one retry DoSing.

i think the best solution is, as I stated above, to reuse connections after "end" by waiting until nextTick() to call "close". more accurately, i think we should setTimeout(cb, 0) so that we call close after all nextTick() callbacks have fired. that way, any kind of sequential http requests you want to do will hold on to the same connection.

Would IMO be nicer than the current situation although not optimal.

thejh commented 12 years ago

@mikeal To clarify: I'm suggesting to only reuse idle connections for GET and HEAD.

edit: sorry, accidentially wrote POST

mikeal commented 12 years ago

@thejh that's a huge change. the agent currently isn't even aware of anything but host:port from the http request. there are also like 30 different HTTP methods that may or may not want to be included.

my priorities are in this order:

reliability
clarity (the implementation must be clean and understandable [the old Agent was not])
performance

Obviously, i want to handle the cases that effect the most people. I don't think most people care about much more than we have, and if they do they probably want the perf bump primarily for sequential requests which we can handle without keeping a min socket pool.

I'm -1 for any change that is really complicated. Having variant behavior for the number of retries on a request and altering behavior per method sounds too complicated to be understood easily by most node users which will lead to very hard to track down issues being reported (a huge problem with the previous agent code).

mranney commented 12 years ago

I think a library like request is the best way place to implement some kind of retry on socket errors, as opposed to in node itself. If it isn't request, then some other library, somehow, should implement the kind of keepalive that people expect.

Take the Facebook API consumer case. We currently do between 60 and 120 client requests per second to Facebook. Because these requests are spread across a great many processes, we have little opportunity for TLS connection reuse. This is a lot of CPU time for us that could be saved by keeping connections open.

I totally agree that reliability is the top priority and that handling errors and retries is going to be hard. But shit man, that's why we need smart people like you to get it right, at least once, in a library so everybody else doesn't break their programs trying to handle all of this.

thejh commented 12 years ago

If nobody else wants to, I'll try it - I think I have a rough idea of how it should be done.

mikeal commented 12 years ago

@mranney @thejh if we want to experiment with a "forever" request, something like.

var request = request.forever({timeout:1000, minSockets:40})
request.get("http://www.google.com")

@thejh if you want to tackle this, it should have a similar API as request.defaults, it should return the full request API and default to a different Agent that has this minSocket code in it and whenever it gets a new host it should kick off and hold a bunch of sockets. Might want to also have an option that limits the hosts using that Agent to a list of known hosts.

This is fine for request, but i doubt we can get this in to core ever. But, the reuse of sockets before nextTick() could definitely make it in to core.

thejh commented 12 years ago

Still with a big bunch of logging output, but it seems to work. Link as before: https://github.com/thejh/node/compare/joyent:v0.6...agent-pooling

Test: https://gist.github.com/1536169 Wireshark Capture File (in base64): https://gist.github.com/1536198 Basically, the client requests the second file after 5 seconds. As soon as the request comes in, the server kills the connection. The client now retries on a new connection. And it works! :)

thejh commented 12 years ago

And apart from self.canReuseConnection (which I can also get as !self.useChunkedEncodingByDefault), this is only in the agent, so it should be easy to split it out into a module. :)

thejh commented 12 years ago

@mikeal How does that ShimSocket look? Too hacky?

mikeal commented 12 years ago

yeah, ShimSocket is likely to break between node versions, possibly even point releases.

i think you should break it out in to a feature for request, which I can accept, and not deal with retries yet.

get it working and reliable w/o retries and then stick them in later.

this shim stuff seems like way too much code for something relatively simple. we have the higher order objects, we can recreate the data we previously sent, so it's probably better to tackle retries outside of the socket itself where it can have some knowledge of HTTP.

thejh commented 12 years ago

@mikeal As this is an agent patch, how would that be done in request? Make a new agent instance, replace/wrap the 'free' listener, patch/wrap addRequest, wrap removeSocket? Or copy the agent code into request? Both don't seem very clean to me...

mikeal commented 12 years ago

let's do this.

create a file, forever.js, in the request tree. util.inherits http.Agent and override the methods you've changed.

then just add a feature to request for a flag that uses this new agent.

thejh commented 12 years ago

Alright, I'll try. :)

thejh commented 12 years ago

@mikeal https://github.com/thejh/request/compare/mikeal:master...retry-agent - first two commits for forever requests, third one does retry. How does it look?

thejh commented 12 years ago

@mikeal Are you going to pull this?

isaacs commented 12 years ago

I just read through this.

This is such a beautiful story. The moral is: keep core tiny, and userland will innovate. <3

mmalecki commented 12 years ago

Still a better love story than Twilight.

Sorry, I just couldn't resist.

dscape commented 12 years ago

Following what @mranney said:

For HTTPS client applications that consume an API like Facebook's 
or Twitter's, the current keepalive performance is pretty awful. Negotiating 
new TLS sessions is very expensive for both sides, so avoiding
that is an obvious and gigantic win.

Does this issue, and pull request, fix this problem at all? It looks like to me like it doesn't and this seems to me a far worse problem, especially for companies using couchdb in the cloud :)

@thejh care to comment?

thejh commented 12 years ago

@dscape Hmm, right, I only patched HTTP... but should be fairly easo to just apply the same patch to the https agent, I guess...

thejh commented 12 years ago

@dscape, if you want to, you can also do it yourself :D - in forever.js, add an agent for https with a copy of https://github.com/joyent/node/blob/master/lib/https.js#L54 and util.inherits(SecureForeverAgent, ForeverAgent). Then put conditionals into main.js in the two places ForeverAgent is used.Should be everything.

/me feels lazy

dscape commented 12 years ago

@thejh would apply that if it indeed fixes the problem. However my level of expertise is not enough to determine whether it would.

my intuition seems to think it's probably more complicated than that :)

mikeal commented 12 years ago

yeah, this patch needs to get a little more complicated i think

forever needs to become a boolean option which effects picking the agent in request. using request.forever() should work for http and https without user intervention.

mikeal commented 12 years ago

http://nodejs.org/api/http.html#agent.maxSockets

require('http').globalAgent.maxSockets = 100

mikeal commented 12 years ago

i think we merged this forever work already right?

dscape commented 12 years ago

Yup, but some of the discussions here seem to have resurfaced on twitter today. But hey I saw your name there ;)

mikeal commented 12 years ago

that discussion is about this, or something like it, going in to core and if it would be default.

this ticket can stay closed.

CrabDude commented 10 years ago

I'm trying to understand why connections are not reused for PUT & POST. I understand that the data can be lost due to a closed connection, but this is no more true for foreverAgent than it is for core and no more true for a new connection than for an existing connection (perhaps this is the part I am misunderstanding).

If I understand the TCP/HTTP stack correctly, expected connection closures are announced and confirmed, while unexpected closures are just that, unexpected. If I open a new connection and prepare to submit POST data on that new connection, how is it any less likely that the submission of the data will succeed than if it was sent over an existing keep-alive connection?

We're currently finding ourselves needing PUT & POST socket reuse for high write services because node's behavior is leaving too many sockets in a TIME_WAIT state.

mikeal commented 10 years ago

All connections are reused once you exhaust the maxSockets in the connection pool. The reason the current agent was written to close connections when pending requests are not available is:

1) the agent was written two major releases before we had socket.unref(). before socket.unref() it was impossible to preserve node's "run to completion" semantics when you had open connections.

2) there are cases, and you will see them once you can do this again in production, where a socket has been kept around for a while and gets closed by the remote end (do not expect this to be "clean", exhausting the timeout for a remote server tends to cause all kinds of aggressive destroy behavior) which is fine if the socket is just idle waiting for someone to reuse it in node, but, for a window of approximately the RTT between client and server if you try and use that connection you will write data to it and get a hangup which will cause an error. From a high level, you've done nothing wrong, and you should just retry, which is what foreverAgent does for GET/HEAD (not for PUT/POST because that is very dangerous, you could end up writing twice).

This is all going to be a lot better in 0.12. @isaacs is writing some "minSocket" like behavior to keep connections open for reuse and unref() means we don't have to worry about run to completion going away. On point #2 we still have a problem and request will probably take on a new module that just does the retry semantics we had in forever agent.

thomas-riccardi commented 10 years ago

I concur on @CrabDude remark: having a non clean failure while sending the HTTP request may happen in both cases:

when reusing a socket (if a proxy somewhere in the path decided to kill the connection silently for example)
but also when creating a socket: the network cable can be cut (it's probably less likely, but the possibility is still there)

The scenario "the http client called has done nothing wrong but still received an error" applies in both cases, so I don't understand why @mikeal uses it to dismiss the socket keep-alive feature.

The important thing is that it will always be a network error, and not a HTTP error, it should be returned as such to the http client user so it can respond appropriately to the error.

There is still the question of what to do in case of network error, PUT/POST methods may indeed not be idempotent, it entirely depends on the HTTP server application. But it's a question that all http request clients should ask themselves anyway (even without socket keep-alive), and answer according to their context, server, etc...

(I think we shouldn't rely on the HTTP method on the library for maybe retry since not all HTTP API correctly use the standard method semantics, or at least it should be configurable.)

From the doc on future node v0.12 it seems now possible to optionally keep the socket open even when it's not used, so it should finally answer the initial issue!

mikeal commented 10 years ago

In the 3.0 branch I intend to take the 0.12 agent implementation rather than use whatever is in core which should resolve this for everyone.

request / request

Why one connection per request? #142