Open nthallen opened 4 years ago
The problem may be associated with an unclean shutdown of some sort. bfr
and Bootstrapsrvr
are both listening and using their ports, but only bfr
's socket gets stuck in TIME_WAIT. Need to take better care to ensure that both sides do an orderly shutdown of socket connections.
No, you always get a TIME_WAIT when ending a connection. If both processes are on the same node, then essentially both ports are blocked for the duration, but if on different nodes, then it appears that the process that closes first ends up with the TIME_WAIT condition. As such, if we can adjust our protocols so the clients close first, we should avoid having servers blocked from listening on their established ports. Since the clients choose essentially random port numbers, they are less likely to be bothered by the TIME_WAIT problem.
When TCP daemons terminate, their ports are not immediately available for a restart. They need to wait until TIME_WAIT elapses, which is fairly long. This could make rapid restarting difficult. At the very least, it requires some thought, since it currently interferes with startup.
Symptom is:
and