http client: Reasonable KeepAlive behavior

isaacs commented 11 years ago

Everyone wants this. People expect it: #1958

The current system is build so that a socket will only be reused if there is a pending request. However, this is not optimal for many many use cases. It was designed that way due to a limitation that any connected socket would keep node open, even if we were not using it. But now we have Socket.unref() and Socket.ref().

Also, 5 is always the wrong number, so it's a bad default.

The http client should keep a pool of sockets around for each server we connect to. This should be a a configurable max (8 by default) per protocol/host/port.

When a socket is in use, we socket.ref(). Then when it goes back into the pool, call socket.unref(). If a socket gets ECONNRESET while waiting in the pool, just ignore the error, and discard that connection.

It's a bit of added bookkeeping, but this will make the http client behave much more expectably, and more like a web browser.

/cc @mranney @mcavage @bnoordhuis @piscisaureus @mikeal

mikeal commented 11 years ago

i'm actually of the opinion that we should remove pooling from core. there are several approaches to this problem and all change the expected API behavior.

socket.unref() and socket.ref() solve the "node doesn't exit" problem but it doesn't solve the more annoying problem of a socket being attached to an http request and closed on the remote end unrelated to the new request, which will cause an error to be emitted on the request object that is actually unrelated to that request. this is what we solve with retries in request.forever but i'm very hesitant to add retries to core.

this problem is hard, there's more than one way to solve it, i think we should push it back in to userland.

mikeal commented 11 years ago

also, if i don't win this argument and get pooling ripped out of core, here's some things we'll need to remember.

the "agentRemove" event will need to stay intact as it's used a decent amount in userland. the semantics should be that the socket is removed from the pool but is not unref'd.
move all event handlers for socket and httpRequest events in the agent to pool.onEventName methods. this would make customizing significantly easier than it is now.

isaacs commented 11 years ago

i think we should push it back in to userland.

In this day and age, an http client that doesn't support keepalives or retries is not usable. If we "push it back into userland", then we may as well push the whole http client back to userland.

Yes, it's a hard problem. But it's far from impossible, and it's something that belongs in core. What we have now is wrong in every direction, and needs to be replaced either with something that isn't broken, or with nothing.

isaacs commented 11 years ago

Just to be clear: By "retries" I mean, "I tried to use a socket out of the pool, got an immediate ECONNRESET, so now I need to reconnect." Not "I got an error halfway through downloading the thing."

The second case is an error. The first is a socket that was dead, but we don't know until we poke it.

mikeal commented 11 years ago

@isaacs totally clear on the ECONNRESET not being a thing after the response is written but you're missing the real problem. It's not "I tried to use a socket out of the pool, got an immediate ECONNRESET", its, I got assigned a socket and in the same tick i constructed all the request data and wrote it to the socket and I get a ECONNRESET in the future before I get a response.

you'll pretty much never get an ECONNERESET before writing the request out because you'll always write the headers right when it gets assigned. you have to retry, know that you retried, and let the error emit the second time. also, for fear of mutating data more than intended, you can't retry anything but GET and HEAD and that's assuming some idiot's PHP site isn't mutating the backend data on GET based on querystring params.

this kind of pooling isn't handled by any of our contemporaries. yes, they do "keep-alive" but they're using blocking IO and get out of most of these inflight issues, or they just pretend they don't exist.

the problem with these bugs is that you don't see them unless you're under load which makes them hard to test and debug and it means it'll be months after release until we know if there are any side effects.

now, if we do decide this must go in to core, we could construct it as a general client socket pool which would make it much more useful, general, and help out people building socket clients with pooling like redis. this was how i wanted it done originally but ryan went off and wrote it in to the http library and shipped it while it was still kinda broken and slow. my point is, if socket pooling is so important that not having it renders the http client useless then it's probably a big enough problem for people writing non-http socket clients that we should give them a solution.

isaacs commented 11 years ago

Re: abstracting out "pool some sockets" into a standalone thing, yes, absolutely, that is the correct abstraction. Implementation detail, we'll work that out when we're actually building it, post-v0.10. I'd like to see something like this:

var pool = net.pool(configs for connect as well as pool size, etc.);
var conn = pool.connect(function onConnect() { ... });

And then we can just use that instead of net.connect. We could also do something fancy like net.connect({args, plus pool:true}) but that feels a bit too magical. Maybe it's easy. I don't know. We'll see. But the point is that then other libs could use it directly for TCP/TLS pooling, and we could have it neatly abstracted outside of the http logic.

It's not "I tried to use a socket out of the pool, got an immediate ECONNRESET", its, I got assigned a socket and in the same tick i constructed all the request data and wrote it to the socket and I get a ECONNRESET in the future before I get a response.

If you write the first packet of an http request to a socket, and get a RST packet back, then the correct behavior is to tear it down and try again.

When I say "Immediate ECONNRESET", I mean, in the first socket._handle.ondata() call, after writing the first chunk of the request.

The subtlety of the API needs to be worked out, obviously. Any socket pool needs to be able to swallow RSTs on sockets that haven't been used in a while, because the other side will often decide that they're dead, and start culling them. And that's kind of hard when the socket object has already been handed off to a user. So, we'll have to be smart enough to say, "This socket is actually from a pool, so let's just make a new connection, and swap out its _handle object" or something, under the hood. (Perhaps what we're pooling is not JS net.Socket objects, but just the tcp_wrap handles. TBD.)

You don't have to worry about servers having interpreted the request at that point and idempotency (which you obviously DO have to worry about if you're retrying on any random error that occurs mid-stream). It never made it to the server, because there was never an ACK; instead, we got a RST. "That connection isn't here any more, try again." The server told us explicitly to reconnect, because that's what "reset" means.

this kind of pooling isn't handled by any of our contemporaries.

It's handled by curl and by web browsers. That's all of our contemporaries.

This is actually a pretty normal part of the TCP/HTTP dance that we just don't do.

mcavage commented 11 years ago

My .02:

I would very much like to see the http client make connection management more transparent to me. The current semantics of the API require convoluted usage upstack to make it do what you want with regards to timeouts/backoff/etc. Also, any magic number of 5 or 8 is still a magic number - at the top of any client file I have the same http(s).globalAgent.maxSockets = 65535; to get it out of my way, and then manage scheduling myself.

I very much want a "http handle" that has a .request() or .get()/.put() etc. on it, where it's very clear that the handle is 1:1 with a single connection. That connection really does need to support full keep-alive, because it's just too debilitating to any sort of latency-sensitive system to not have keep-alive on; by "full" I mean not the semantics that connections are reused IFF there are queued requests. I want it alive until either end explicitly closes it. Servers on the big bad internet can set connection: close if they don't want to clients to stay connected. This is "guidance" at best anyway; if a server system really cares they're going to have to reap idle/max conns no matter what.

As to retry/backoff, this is not specific to HTTP, but any TCP-based system - I think node needs to do that for connection establishment (but notably not user requests), along with connection establishment timeout. It's required for any real system, and I know at least in all of our TCP/LDAP/HTTP systems, we have the same boilerplate setTimer/setup retry+backoff logic/wait for connection, loops. I suspect there's lots of wrong code out there doing this, including mine/ours because it's error prone and it needs to be replicated all over. I'm pretty sure what's described above would be fine - i.e., before there's data that's actually been ACK'd, just manage retry for me. After that, emit a RST, or whatever to consumer that my message got dropped.

Everything above has nothing to do with a pool, that's just for a single socket. Now the question is around pooling - I suspect that a general TCP pool wouldn't necessarily work for HTTP or vice-versa; plenty of protocols allow full async communication over a single socket (i.e., LDAP/SPDY/...), in which case "checking out a connection and queuing" is completely artificial. On the flip side, you obviously need to checkout+queue for HTTP and most every DB protocol that I know of. I do agree with @isaacs that it's essentially standard these days for every user agent (don't forget java, where jakarta-commons still powers a significant percentage of clients that aren't web browsers out there) to do this, so I don't see how node can offer up HTTP as a supported "core" protocol and not do it for most users, but it wouldn't be the end of the world either to just say it's explicitly an upstack problem, IMO.

So in tl;dr form:

You have to do full keep-alive in core
HTTP connection management should be explicit like every other require('net') based connection as opposed to what it is now. If I don't like whatever node core comes up with for pooling, I can use this much more easily upstack than today.
retry/backoff/connectTimeouts should really be part of net core for connection establishment.
HTTP pooling on top of the above is, in my opinion, important if you want to promote node over toolkit/language X, but it's not an irrational argument if it's not in core.

mikeal commented 11 years ago

It's handled by curl and by web browsers. That's all of our contemporaries.

the browser works more or less like what we have now. it does not keep connections alive that it's not using. the use case is quite different, almost all outgoing requests are known pretty quickly and a shorter life cycle so they don't have the same problems we do.

full async communication over a single socket (i.e., LDAP/SPDY/...),

SDPY requires that you only use a single connection. HTTP 1.1 has keep-alive and pipelineing but the pipelining doesn't support multiplexing so almost nobody uses it because it ends up being slower than pooling.

HTTP connection management should be explicit like every other require('net') based connection as opposed to what it is now. If I don't like whatever node core comes up with for pooling, I can use this much more easily upstack than today.

The problem is, this logic must exist in the http response handling because HTTP allows for the response to dictate keep-alive and close so this isn't something you can be 100% consistent with given the input configuration like you're suggesting. The current API (agent) is terrible for writing your own handling of these cases but it's important to establish upfront that both connection:close and connection:keep-alive handling have to be defined for every outgoing http request.

mikeal commented 11 years ago

would also like some of @dannycoates' input on pooling.

mcavage commented 11 years ago

SDPY requires that you only use a single connection. HTTP 1.1 has keep-alive and pipelineing but the pipelining doesn't support multiplexing so almost nobody uses it because it ends up being slower than pooling.

Yes, I understand; whatever SPDY "requires" w.r.t. to a single connection is sort of meaningless - clients can always make more. I was only saying that there are drastically different models so I don't see how a generic pool for all things sockets would work out.

The problem is, this logic must exist in the http response handling because HTTP allows for the response to dictate keep-alive and close so this isn't something you can be 100% consistent with given the input configuration like you're suggesting. The current API (agent) is terrible for writing your own handling of these cases but it's important to establish upfront that both connection:close and connection:keep-alive handling have to be defined for every outgoing http request.

Sorry, I meant by this that you want the "handle" object to either reuse the existing connection or create a new one for each request as dictated by the current state. That model would work, and be 1:1 of "thing I hold to socket".

mikeal commented 11 years ago

last time i looked at the SPDY spec it stated that you can't use more than one connection. sure you could make more but there's no incentive other than load testing.

SPDY support will actually look a lot like websockets, the HTTPS connection get's "upgraded" and then it gets taken out of the pool.

maybe the solution to these handlers is something like ClientRequest.getConnection = pool.getConnection so that the request can be customized for getting a connection and it's default behavior is to get one from this.pool, which can also be customized at the pool level.

mranney commented 11 years ago

the browser works more or less like what we have now. it does not keep connections alive that it's not using. the use case is quite different, almost all outgoing requests are known pretty quickly and a shorter life cycle so they don't have the same problems we do.

Are you sure about that? I'm pretty sure that browsers keep the underlying TCP connections open for a while in case you want to make more HTTP requests on top of them. Especially for sites that use HTTPS, this is a gigantic win.

mranney commented 11 years ago

I think it's fair to say that non-browser implementations of HTTP clients do not typically have this kind of "linger keepalive" support. Node has better support for HTTP than just about everything else, by design, and this keepalive thing is one area that's still lacking.

Let's not get distracted by the issue of SPDY and pipelining multiple requests over the same TCP. If those kinds of solutions were optimal, we wouldn't need HTTP. SPDY may indeed by the future, but right now in the present we need real HTTP keepalive, either in node or in userland. At Voxer we've done a version of this in our application code, and it's OK, but everyone should not have to implement their own keepalive logic to get the behavior they expect.

isaacs commented 11 years ago

SPDY and pipelining are separate concerns entirely. We don't have first-class SPDY support in core, and aren't likely to any time soon. But the node-spdy module is maintained by @indutny who is always deep in core development and will keep us from breaking anything it depends on. Our client doesn't do pipelining, and likely never will.

Our client does do keepalive, just badly -- it'll only reuse connections that already have pending requests, but won't keep the socket alive if there are no requests pending, when it can do that perfectly easily. We don't handle ECONNRESET very nicely in the client at all, either, and using longer-lived keepalives will force that issue, I'm sure.

But, there is clearly user demand for this, it's a reasonable user expectation, and hard or not, it's something we have to do. v0.12 is the version to land this in.

mikeal commented 11 years ago

We don't handle ECONNRESET very nicely in the client at all,

i thought this was cleaned up? should make it's way to the ClientRequest object and emit error.

isaacs commented 11 years ago

i thought this was cleaned up? should make it's way to the ClientRequest object and emit error.

It's significantly cleaner. But not still not optimum. With a keepalive connection, you want to be able to do handle ECONNRESET from the first handle.ondata() after the first write, and if you get that, make a new connection, and try again. That logic will go into the socketpool thingamajig, I imagine.

natevw commented 11 years ago

Are +1, "me too!" comments unwelcome here? Because this is one.

I need to implement custom agent/pooling behavior for my middleware, partly to more gracefully handle a connection limit my database provider has but mostly to deal with the horrible latency caused by lack of Keep-Alive.

Personally I'd be happy with the built-in library letting me initialize an explicit HTTP connection rather than request, but in lieu of that a better default keep-alive strategy (and less weird ECONNRESET behaviour when maxSockets is exceeded) and some non-UTSL documentation for custom agents would do.

mikeal commented 11 years ago

i find +1 comments that have no explanation entirely useless. yours is quite detailed and, I for one, welcome it :)

isaacs commented 11 years ago

This was landed in master some time ago. Closing this issue. If there are other issues with it, please post new issues. Thanks for the feedback everyone :)

ravi commented 10 years ago

@isaacs since the documentation on http.Agent does not make this clear (to me :-)), can you explain what was landed in master? Ideally, it would be great if the caller could configure a connection keep alive time/timeout globally, or even per request/connection, but is that the implementation, or is the implementation that http.Agent just uses a "reasonable" wait period (for a new request) before closing a connection?

natevw commented 10 years ago

@ravi Would love to hear a bit more too, but I just wrapped up work implementing node-compatible (v0.11) 'http' module and so maybe I can give a few pointers meanwhile:

the Agent is better, but still not entirely like e.g. a browser. If you allow free sockets to persist, they will hang around indefinitely unless you manually deal with them
I don't know if they're 100% up to date (seemed to match the code pretty well as far as I saw), but if you haven't already, you might find more useful info in the latest docs at http://nodejs.org/dist/v0.11.13/docs/api/http.html#http_class_http_agent

ravi commented 10 years ago

@natevw Greatly appreciate your response. The docs link you posted is indeed relevant, but unfortunately it seems to make matters murkier (at least to me):

This section remains from the old doc:

The HTTP Agent also defaults client requests to using Connection:keep-alive. If no pending HTTP requests are waiting on a socket to become free the socket is closed. This means that Node's pool has the benefit of keep-alive when under load but still does not require developers to manually close the HTTP clients using KeepAlive.

This is essentially, IIUC, the behaviour in 0.10 (and earlier?) i.e., connections are opened with keep-alive but closed if no requests are pending (no grace period).

However immediately after that, we read:

If you opt into using HTTP KeepAlive, you can create an Agent object with that flag set to true. (See the constructor options below.) Then, the Agent will keep unused sockets in a pool for later use.

So, now we have a boolean option keepAlive whose behaviour is a bit ambiguous to me. If it is true then you opt into using HTTP keep-alive. Which is, I would guess, the HTTP KeepAlive describe in the second quoted section above wherein the socket is not closed automatically. Which raises the question, when does the logic outlined in the first quoted section apply? When the option keepAlive is false?

I am guessing the right way to read the doc is:

If you do not pass an agent, then the default one will be used and its behaviour, as before, is to keep the connection alive only as long as there are pending requests.
If you do pass in an agent, it will default to not have connections kept alive i.e., connections will close at the end of the request (which perhaps is the same behaviour as agent : false). If, while creating this agent, you set keepAlive to true then you get infinite/indefinite persistence of that socket independent of pending requests, etc.

I suspect however that my reading may be flawed.

Also, the use of the word socket is a bit unfortunate, since in classical networking terminology there is an important distinction between a socket and a connection (or connected socket), which unluckily is quite relevant here.

ravi commented 10 years ago

Time to read the code! :-)

natevw commented 10 years ago

@ravi I think your understanding is mostly correct. Here's mine:

With an Agent configured for {keepAlive:false}, it will reuse connections only if there is a backlog (so it may actually use "Connection: keep-alive" under load; probably best to think of {keepAlive:false} more like forcing {maxFreeSockets:0} than anything else).
The http.globalAgent is configured as above.
Creating and then always passing an agent with {keepAlive:true} will then keep up to maxFreeSockets hanging open waiting for new requests to that scheme/host/port/localAddress combo.
You can pass in an agent created as above to all your requests and you will get the keep-alive behavior you want; you may even be able to hack around with .setTimeout on the sockets to close them after inactivity but YMMV.
In the opposite case, if you request with {agent:false} your request gets a new single-use agent instance w/defaults; i.e. it has the {keepAlive:false} behavior of the first bullet point above but is also guaranteed not to be reused since only the one request will ever be in its backlog.

Hope this is helpful (and correct :-P)

nodejs / node-v0.x-archive

http client: Reasonable KeepAlive behavior #4769