Open ephur opened 5 years ago
@ephur Were the nodes showing are unreachable and holding the IP's are removed from Kuberentes cluster? In the logs shared I can not find any activity related to those nodes.
When I initially looked yesterday, I thought two of them were not but I must have overlooked something, yes all of those reporting have been removed from the cluster, and replaced with other nodes during an upgrade.
What you expected to happen?
ipam assignments from hosts to be correctly removed, and IPAM to report properly for all known good peers.
What happened?
when implementing more monitoring for our weave deployment we discovered that weave_ipam_unreachable_count was non zero for all of our weave pods. It appears
How to reproduce it?
We have not been able to reproduce yet in a cluster that does not experience this issue.
Anything else we need to know?
Our deployments at this time are on AWS. Our k8s clusters run 1.11.2, and are built by our own automation tooling. Weave is deployed as a helm chart, based off of the upstream YAML provided by weave works.
Versions:
Logs:
logs are large, and can be found at: https://gist.github.com/ephur/86d63c041ba5977eed259d5a87f34c0a
Network: