Closed Levi080513 closed 4 months ago
2024-04-08 16:48:04.700 [ERROR][75] monitor-addresses/autodetection_methods.go 185: Unable to parse CIDR 10.255.2.214 : invalid CIDR address: 10.255.2.214
This seems to be a separate issue, where we're failing to parse the IP from the node since it's not in CIDR notation.
That said, I agree with your analysis of the issue here - it appears for specifically for the k8s internal IP method of auto detection, we continue to use the stale node queried when calico/node first started rather than re-querying the node on each loop.
We probably want to move the API call to query the Node inside of the loop so that we're working with updated information on each iteration.
This seems to be a separate issue, where we're failing to parse the IP from the node since it's not in CIDR notation. https://github.com/projectcalico/calico/blob/5741d7df6dfe2453c41be46f4d990dd7b56b1d4c/node/pkg/lifecycle/startup/autodetection/autodetection_methods.go#L199-L238
Calico will match the network interface on the node by IP and return the CIDR of the network interface. If the match fails, the IP is returned directly. So I understand this should be the same issue.
@caseydavenport If so, can I try to fix this?
Right, yeah I guess because it's using the old IP it's failing to find a match. I would be happy to review a PR to fix this :+1:
Thanks for the good investigation!
/close
https://github.com/projectcalico/calico/pull/8728 was merged.
In some scenarios, the IP of machine where the k8s node is located may change. Kubelet can sense the change and update it to the node CR, but calico does not seem to work properly in this scenario. The pods on this node are not accessible via pod ip and will never recover.
To add, the network mode of calico is IPIP.
Expected Behavior
The pods on this node can accessible via pod ip.
Current Behavior
The pods on this node are not accessible via pod ip.
Possible Solution
Restart the calico-node pod on this node.
Steps to Reproduce (for bugs)
Context
When I tried to analyze this problem, I found that we support automatic update of BGP IP, but the problem seems to be here. https://github.com/projectcalico/calico/blob/5741d7df6dfe2453c41be46f4d990dd7b56b1d4c/node/pkg/lifecycle/startup/startup.go#L315-L366 We only obtain the k8s node once during startup, and will not obtain the latest k8s node information after that. Therefore, when the k8s node ip is updated, we always use the old IP to match the IP on the network interface, it will never succeed and the monitor-addresses log verify this. The old ip of machine is 10.255.2.214 and the new ip is 10.255.2.215. monitor-addresses log like this.
Your Environment