The AWS Network Load Balancer supports “Connection termination on deregistration”. We explored this option because it was thought that it may sever the connections between the Teleport Node and Teleport Proxy upon deregistration. If the connections are properly closed, Teleport Nodes immediately begin their discovery process.
It appears that “Connection termination on deregistration” does not properly close connections. As a result, Teleport Nodes continue to try heart beating through these connections until the keep alive interval and keep alive count thresholds are crossed.
Given “Connection termination on deregistration” did not properly close connections, we tried to reduce the keep alive interval to 5 seconds. This seems to improve the speed at which Teleport detects a bad connection. However, this may not be a valid solution as frequent heart beats increase the network traffic and increase the probability of errant connection disconnects.
What happened:
When a Teleport Proxy is removed from a Load Balancer, the connections are not terminated, resulting in lingering connections. The Teleport Node does not detect this quickly, thus, the node does not attempt to reconnect to the proxy until the keep alives expire.
What you expected to happen:
Teleport Node connections to the Proxy terminate gracefully, and the node attempts to reconnect to the proxy with no delay.
How to reproduce it (as minimally and precisely as possible):
Rotate a node in and out of an AWS NLB and watch the connections using netstat.
Description
The AWS Network Load Balancer supports “Connection termination on deregistration”. We explored this option because it was thought that it may sever the connections between the Teleport Node and Teleport Proxy upon deregistration. If the connections are properly closed, Teleport Nodes immediately begin their discovery process.
It appears that “Connection termination on deregistration” does not properly close connections. As a result, Teleport Nodes continue to try heart beating through these connections until the keep alive interval and keep alive count thresholds are crossed.
Given “Connection termination on deregistration” did not properly close connections, we tried to reduce the keep alive interval to 5 seconds. This seems to improve the speed at which Teleport detects a bad connection. However, this may not be a valid solution as frequent heart beats increase the network traffic and increase the probability of errant connection disconnects.
What happened:
When a Teleport Proxy is removed from a Load Balancer, the connections are not terminated, resulting in lingering connections. The Teleport Node does not detect this quickly, thus, the node does not attempt to reconnect to the proxy until the keep alives expire.
What you expected to happen:
Teleport Node connections to the Proxy terminate gracefully, and the node attempts to reconnect to the proxy with no delay.
How to reproduce it (as minimally and precisely as possible):
Rotate a node in and out of an AWS NLB and watch the connections using netstat.