Prevent socket destruction when keepAliveTimeout fires

Version: 8.9.0
Platform: Linux dfdab2b374b3 4.9.36-moby #1 SMP Wed Jul 12 15:29:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Subsystem:

I'm currently having an issue where a Node service under heavy load behind a load balancer intermittently sends a RST ACK during a Connection: keep-alive request resulting in a 502 on the client (load balancer passes back 502).

Upon further inspection there is a 5 second window where no requests are received from the load balancer. It appears that the keepAliveTimeout default 5 second timeout is hit and Node then closes the connection and destroys the socket. In most cases (99% of requests that time out) Node successfully tears down the connection with a proper TCP teardown handshake between load balancer and Node, however there appears to be an edge case where a request will be sent from the load balancer directly before the TCP teardown occurs which then results in a RST being sent back to the load balancer (and hence 502). This appears to occur because Node issues a close syscall destroying the socket directly before initiating the tear down handshake.

Normal teardown wireshark capture:

236056  2017-12-07 19:51:31.961253  10.23.34.195    10.23.67.39 HTTP    1067    POST /api HTTP/1.1  (application/json)
236057  2017-12-07 19:51:31.961267  10.23.67.39 10.23.34.195    TCP 66  80→52024 [ACK] Seq=12801 Ack=33052 Win=95232 Len=0 TSval=242257 TSecr=1567689
236680  2017-12-07 19:51:32.061090  10.23.67.39 10.23.34.195    HTTP    466 HTTP/1.1 200 OK  (application/json)
236681  2017-12-07 19:51:32.061368  10.23.34.195    10.23.67.39 TCP 66  52024→80 [ACK] Seq=33052 Ack=13201 Win=62464 Len=0 TSval=1567714 TSecr=242282
248801  2017-12-07 19:51:37.067175  10.23.67.39 10.23.34.195    TCP 66  80→52024 [FIN, ACK] Seq=13201 Ack=33052 Win=95232 Len=0 TSval=243533 TSecr=1567714
248807  2017-12-07 19:51:37.067480  10.23.34.195    10.23.67.39 TCP 66  52024→80 [FIN, ACK] Seq=33052 Ack=13202 Win=62464 Len=0 TSval=1568966 TSecr=243533
248808  2017-12-07 19:51:37.067504  10.23.67.39 10.23.34.195    TCP 66  80→52024 [ACK] Seq=13202 Ack=33053 Win=95232 Len=0 TSval=243533 TSecr=1568966

RST Wireshark capture:

221188  2017-12-07 19:51:26.881806  10.23.34.195    10.23.67.39 HTTP    1068    POST /api HTTP/1.1  (application/json)
221189  2017-12-07 19:51:26.881838  10.23.67.39 10.23.34.195    TCP 66  80→52026 [ACK] Seq=7201 Ack=19033 Win=67072 Len=0 TSval=240987 TSecr=1566420
221814  2017-12-07 19:51:27.018326  10.23.67.39 10.23.34.195    HTTP    466 HTTP/1.1 200 OK  (application/json)
221815  2017-12-07 19:51:27.018601  10.23.34.195    10.23.67.39 TCP 66  52026→80 [ACK] Seq=19033 Ack=7601 Win=47360 Len=0 TSval=1566454 TSecr=241021
236528  2017-12-07 19:51:32.042525  10.23.34.195    10.23.67.39 HTTP    1066    POST /api HTTP/1.1  (application/json)
236529  2017-12-07 19:51:32.042538  10.23.67.39 10.23.34.195    TCP 66  80→52026 [ACK] Seq=7601 Ack=20033 Win=69120 Len=0 TSval=242277 TSecr=1567710
236697  2017-12-07 19:51:32.066487  10.23.67.39 10.23.34.195    TCP 66  80→52026 [RST, ACK] Seq=7601 Ack=20033 Win=69120 Len=0 TSval=242283 TSecr=1567710

Sysdig capture taking place at same time of RST capture Application is dockerized and the container exposes port 3000, although results are repeatable outside of docker.

832662 19:51:27.018305861 0 node (2989) > writev fd=120(<4t>10.23.34.195:52026->172.17.0.2:3000) size=400
832663 19:51:27.018341806 0 node (2989) < writev res=400 data=HTTP/1.1 200 OK..Content-Type: application/json; charset=
1004716 19:51:32.066469320 0 node (2989) > close fd=120(<4t>10.23.34.195:52026->172.17.0.2:3000)
1004717 19:51:32.066470002 0 node (2989) < close res=0

Ordered by sequence of events:

19:51:26.881806  10.23.34.195    10.23.67.39 HTTP    1068    POST /api HTTP/1.1  (application/json)
19:51:26.881838  10.23.67.39 10.23.34.195    TCP 66  80→52026 [ACK] Seq=7201 Ack=19033 Win=67072 Len=0 TSval=240987 TSecr=1566420
19:51:27.018305861 0 node (2989) > writev fd=120(<4t>10.23.34.195:52026->172.17.0.2:3000) size=400
19:51:27.018326  10.23.67.39 10.23.34.195    HTTP    466 HTTP/1.1 200 OK  (application/json)
19:51:27.018341806 0 node (2989) < writev res=400 data=HTTP/1.1 200 OK..Content-Type: application/json; charset=
19:51:27.018601  10.23.34.195    10.23.67.39 TCP 66  52026→80 [ACK] Seq=19033 Ack=7601 Win=47360 Len=0 TSval=1566454 TSecr=241021
19:51:32.042525  10.23.34.195    10.23.67.39 HTTP    1066    POST /api HTTP/1.1  (application/json)
19:51:32.042538  10.23.67.39 10.23.34.195    TCP 66  80→52026 [ACK] Seq=7601 Ack=20033 Win=69120 Len=0 TSval=242277 TSecr=1567710
19:51:32.066469320 0 node (2989) > close fd=120(<4t>10.23.34.195:52026->172.17.0.2:3000)
19:51:32.066470002 0 node (2989) < close res=0
19:51:32.066487  10.23.67.39 10.23.34.195    TCP 66  80→52026 [RST, ACK] Seq=7601 Ack=20033 Win=69120 Len=0 TSval=242283 TSecr=1567710

As you can see above, it seems that the timeout fires, a request is received, node closes the socket then responds to the pending request with a RST.

It seems this behavior is probably expected as this is simply a race condition. Are there any lifecyle hooks / lower level of control exposed in order to wait for a FIN ACK from the load balancer?

nodejs / node

Prevent socket destruction when keepAliveTimeout fires #17749