Closed andrsnn closed 6 years ago
Closed as it seems a race condition would exist even if granular control exists. Seems like the only real solution is to ensure the requesting client is aware of the timeout and idle connection teardown starts on the client.
I'm currently having an issue where a Node service under heavy load behind a load balancer intermittently sends a RST ACK during a Connection: keep-alive request resulting in a 502 on the client (load balancer passes back 502).
Upon further inspection there is a 5 second window where no requests are received from the load balancer. It appears that the keepAliveTimeout default 5 second timeout is hit and Node then closes the connection and destroys the socket. In most cases (99% of requests that time out) Node successfully tears down the connection with a proper TCP teardown handshake between load balancer and Node, however there appears to be an edge case where a request will be sent from the load balancer directly before the TCP teardown occurs which then results in a RST being sent back to the load balancer (and hence 502). This appears to occur because Node issues a close syscall destroying the socket directly before initiating the tear down handshake.
Normal teardown wireshark capture:
RST Wireshark capture:
Sysdig capture taking place at same time of RST capture Application is dockerized and the container exposes port 3000, although results are repeatable outside of docker.
Ordered by sequence of events:
As you can see above, it seems that the timeout fires, a request is received, node closes the socket then responds to the pending request with a RST.
It seems this behavior is probably expected as this is simply a race condition. Are there any lifecyle hooks / lower level of control exposed in order to wait for a FIN ACK from the load balancer?