Weave Net integration should remove dead peers

bboreham commented 7 years ago

E.g. when AWS takes away a VM because the auto-scale group is shrinking, or because it was a spot instance.

bboreham commented 6 years ago

If the time at when a peer was last seen was tracked as part of the peers list, then dead peers could be identified with addition of a heartbeat threshold (e.g. has not reported for 30-60 seconds, and is also either known dead in EC2 response or otherwise not present in ASG response) and they can then be safely removed.

[the trick here is to reclaim the addresses on just one host]

In addition, I am not sure how this could work efficiently but would it be possible to run a script to remove the peer in question automatically when the host is being shut down in an orderly fashion :thinking_face:

[the trick here is to distinguish "shut down forever" from "shut down temporarily"]

bboreham commented 5 years ago

We did this for the Kubernetes integration; it was rather a lot of work to coordinate across nodes. See https://github.com/weaveworks/weave/issues/2797 and subsequent fixes.

weaveworks / integrations

Weave Net integration should remove dead peers #124