weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 670 forks source link

Feature: Auto-forget for "retrying" connections after specific elapsed time. #3418

Open bklau opened 6 years ago

bklau commented 6 years ago

PROBLEM:

If i do weave connect <some node ip> and that node later crashed, then i see via weave status connections that weave status for that connection is perpetually retrying for that connection even if the node is totally gone.
Accumulating these zombie retries will consume CPU and network resources.

Sample out put from weave 2.4.0:

[ec2-user@ip-10-0-43-121 ~]$ weave.sh status connections <- 10.0.15.153:34945 established fastdp e6:91:92:86:a6:b6(ip-10-0-15-153) mtu=8916 -> 10.0.42.69:6783 established fastdp 0e:ec:b1:fd:8d:44(ip-10-0-42-69) mtu=8916 -> 10.0.19.228:6783 retrying dial tcp4 :0->10.0.19.228:6783: connect: connection timed out

PROPOSAL:

Allow weave connect/launch additional "timeout" connection flags in seconds like so:

`weave connect <some node ip>`  ==> retry forever
`weave connect -t -1 <some node ip>` ==> retry forever
`weave connect -t 300 <some node ip>` ==> retry a connection till 300seconds(5mins) and then remove the connection(equivalent to `weave forget <some node ip>`)
brb commented 6 years ago

Just noting, that you can remove such peer manually with weave forget.

bklau commented 6 years ago

@brb We needs an auto-forget feature because our weave nodes comes and go. It's impractical to do manual forget for large number of weave nodes.

79283 commented 4 years ago

``