silas / node-consul

Consul client
https://www.npmjs.com/package/consul
MIT License
560 stars 83 forks source link

Consider switching http agent to agentkeepalive and setting timeouts, to close idle connections #122

Open TysonAndre opened 2 years ago

TysonAndre commented 2 years ago

https://www.npmjs.com/package/agentkeepalive provides several features which https://nodejs.org/api/http.html#new-agentoptions still doesn't have.

Creating this with timeout and freeSocketTimeout could help prevent leaks in certain networking edge cases (haven't confirmed this)

What's different from original http.Agent?

  • keepAlive=true by default
  • Disable Nagle's algorithm: socket.setNoDelay(true)
  • Add free socket timeout: avoid long time inactivity socket leak in the free-sockets queue. The default seems probably fine, if raising it, it'd be useful to stay under 60000 ms for aws
  • Add active socket timeout: avoid long time inactivity socket leak in the active-sockets queue.
  • TTL for active socket.

Motivation:

  1. Load balancers can silently disconnect connections, causing requests on those connections to silently time out (I think). For example, https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html

    By default, Elastic Load Balancing sets the idle timeout for your load balancer to 60 seconds. Use the following procedure to set a different value for the idle timeout.

    Higher and higher ALB timeouts than 33 seconds may make this less common, but not prevent it

  2. Avoid keeping around idle connections when no longer needed (e.g. after spikes in uses of connections)

Related to #113

silas commented 2 years ago

I've generally wanted to avoid adding new dependencies to the project and prefer to just let users configure what they need.

Is there any reason that just adding agentkeepalive to your project and passing it in via the agent option doesn't work?

TysonAndre commented 2 years ago

Is there any reason that just adding agentkeepalive to your project and passing it in via the agent option doesn't work?

I can do that and planned to. I'm already overriding the agent and was in the process of adding agentkeepalive. I'm also leaving a note here in case others run into similar slow leaks of inactive sockets, though I still haven't confirmed it's the case.

Separately from that, even without adding agentkeepalive, I'd still recommend setting the timeout to something larger than expected papi timeout - it defaults to infinite. in https://nodejs.org/api/http.html#new-agentoptions

timeout number Socket timeout in milliseconds. This will set the timeout when the socket is created.

TysonAndre commented 2 years ago

It looks like issues I'm seeing are possibly unrelated to idle connections, the consul agent was running locally on the affected applications, so it's probably not networking, but remotely possible to be timeouts. At the time, nobody ran ss --tcp --numeric to see how many consul connections or other connections there were to localhost:8500.

Unrelated: The consul agent deliberately disables http keep-alive by design for all outgoing requests, so the golang agent isn't affected by idle timeouts, either.