projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.01k stars 1.34k forks source link

calico vxlan address no report in the IPAMblock and iphandler #7192

Closed JornShen closed 1 year ago

JornShen commented 1 year ago

our cluster use ipam check to check ipam result like this

Scanning for IPs that are allocated but not actually in use...
Found 0 IPs that are allocated in IPAM but not actually in use.
Scanning for IPs that are in use by a workload or node but not allocated in IPAM...
  10.67.24.70 in use by Node(10.0.158.1) and in active IPAM pool but has no IPAM allocation.
  10.66.195.67 in use by Node(10.0.20.1) and in active IPAM pool but has no IPAM allocation.
  10.65.2.70 in use by Node(10.3.87.253) and in active IPAM pool but has no IPAM allocation.
  10.67.140.5 in use by Node(10.2.47.255) and in active IPAM pool but has no IPAM allocation.
  10.64.161.197 in use by Node(10.1.119.255) and in active IPAM pool but has no IPAM allocation.
  10.64.148.130 in use by Node(10.2.211.254) and in active IPAM pool but has no IPAM allocation.
  10.65.90.4 in use by Node(10.1.154.0) and in active IPAM pool but has no IPAM allocation.
  10.66.6.66 in use by Node(10.0.52.1) and in active IPAM pool but has no IPAM allocation.
  10.66.75.130 in use by Node(10.5.184.0) and in active IPAM pool but has no IPAM allocation.
  10.64.78.128 in use by Node(10.0.234.1) and in active IPAM pool but has no IPAM allocation.
  10.67.167.66 in use by Node(10.1.156.0) and in active IPAM pool but has no IPAM allocation.
  10.65.75.0 in use by Node(10.0.208.1) and in active IPAM pool but has no IPAM allocation.
Found 0 in-use IPs that are not in active IP pools.
Found 12 in-use IPs that are in active IP pools but have no corresponding IPAM allocation.

we found some ip is vxlan ip

ip r | grep 10.65.90.4
10.65.90.0/26 via 10.65.90.4 dev vxlan.calico onlink

And I see the calico-kube-controller log, found calico-kube-controller collect these ip.

2023-01-08T18:20:27.736799178+08:00 stderr F 2023-01-08 10:20:27.736 [INFO][1] ipam.go 726: Can't cleanup node yet - IPs still in use on this node handle="k8s-pod-network.0a1298cf53154c6912540e09ed7f94eddb9113a8a5dc1450783a8b1e21599523" ip="10.65.90.1" node="10.1.154.0" pod="logstack/filebeat-filebeat-h79qh"
2023-01-08T18:21:03.737947998+08:00 stderr F 2023-01-08 10:21:03.737 [INFO][1] ipam.go 983: Calico Node referenced in IPAM data does not exist error=resource does not exist: Node(10.1.154.0) with error: nodes "10.1.154.0" not found
2023-01-08T18:21:03.737983198+08:00 stderr F 2023-01-08 10:21:03.737 [INFO][1] ipam.go 726: Can't cleanup node yet - IPs still in use on this node handle="vxlan-tunnel-addr-10.1.154.0" ip="10.65.90.0" node="10.1.154.0"
2023-01-08T18:23:00.338245266+08:00 stderr F 2023-01-08 10:23:00.338 [INFO][1] ipam.go 983: Calico Node referenced in IPAM data does not exist error=resource does not exist: Node(10.1.154.0) with error: nodes "10.1.154.0" not found
2023-01-08T18:23:00.338299551+08:00 stderr F 2023-01-08 10:23:00.338 [INFO][1] ipam.go 726: Can't cleanup node yet - IPs still in use on this node handle="k8s-pod-network.0a1298cf53154c6912540e09ed7f94eddb9113a8a5dc1450783a8b1e21599523" ip="10.65.90.1" node="10.1.154.0" pod="logstack/filebeat-filebeat-h79qh"
2023-01-08T18:32:22.338152238+08:00 stderr F 2023-01-08 10:32:22.337 [WARNING][1] ipam_allocation.go 137: Confirmed IP leak handle="vxlan-tunnel-addr-10.1.154.0" ip="10.65.90.2" node="10.1.154.0"
2023-01-08T18:33:18.33768338+08:00 stderr F 2023-01-08 10:33:18.337 [INFO][1] ipam.go 907: Leaked IP has been resurrected after querying latest state handle="vxlan-tunnel-addr-10.1.154.0" ip="10.65.90.2" node="10.1.154.0"
2023-01-08T18:33:18.814514449+08:00 stderr F 2023-01-08 10:33:18.814 [INFO][1] ipam.go 882: Cleaning up IPAM resources for deleted node node="10.1.154.0"
2023-01-08T18:33:18.955316049+08:00 stderr F 2023-01-08 10:33:18.955 [INFO][1] ipam_block_reader_writer.go 300: Block must be empty but is not empty, refusing to remove affinity. host="10.1.154.0" subnet="10.65.90.0/26"
2023-01-08T18:33:19.090893378+08:00 stderr F 2023-01-08 10:33:19.090 [ERROR][1] ipam.go 947: Failed to release block affinities for node calicoNode="10.1.154.0" error=block '10.65.90.0/26' is not empty
2023-01-08T18:33:19.090924+08:00 stderr F 2023-01-08 10:33:19.090 [WARNING][1] ipam.go 885: Error cleaning up node error=block '10.65.90.0/26' is not empty node="10.1.154.0"
2023-01-08T18:34:14.755304859+08:00 stderr F 2023-01-08 10:34:14.755 [WARNING][1] ipam_allocation.go 137: Confirmed IP leak handle="vxlan-tunnel-addr-10.1.154.0" ip="10.65.90.2" node="10.1.154.0"
2023-01-08T18:35:19.358390912+08:00 stderr F 2023-01-08 10:35:19.358 [INFO][1] ipam.go 907: Leaked IP has been resurrected after querying latest state handle="vxlan-tunnel-addr-10.1.154.0" ip="10.65.90.2" node="10.1.154.0"
2023-01-08T18:35:23.566739752+08:00 stderr F 2023-01-08 10:35:23.566 [INFO][1] ipam.go 882: Cleaning up IPAM resources for deleted node node="10.1.154.0"
2023-01-08T18:35:23.706165366+08:00 stderr F 2023-01-08 10:35:23.706 [INFO][1] ipam_block_reader_writer.go 300: Block must be empty but is not empty, refusing to remove affinity. host="10.1.154.0" subnet="10.65.90.0/26"
2023-01-08T18:35:23.845522966+08:00 stderr F 2023-01-08 10:35:23.845 [ERROR][1] ipam.go 947: Failed to release block affinities for node calicoNode="10.1.154.0" error=block '10.65.90.0/26' is not empty
2023-01-08T18:35:23.845552773+08:00 stderr F 2023-01-08 10:35:23.845 [WARNING][1] ipam.go 885: Error cleaning up node error=block '10.65.90.0/26' is not empty node="10.1.154.0"
2023-01-08T18:36:50.401508908+08:00 stderr F 2023-01-08 10:36:50.401 [WARNING][1] ipam_allocation.go 137: Confirmed IP leak handle="vxlan-tunnel-addr-10.1.154.0" ip="10.65.90.2" node="10.1.154.0"
2023-01-08T18:37:26.004393008+08:00 stderr F 2023-01-08 10:37:26.004 [INFO][1] ipam.go 907: Leaked IP has been resurrected after querying latest state handle="vxlan-tunnel-addr-10.1.154.0" ip="10.65.90.2" node="10.1.154.0"
2023-01-08T18:37:27.424442643+08:00 stderr F 2023-01-08 10:37:27.424 [INFO][1] ipam.go 882: Cleaning up IPAM resources for deleted node node="10.1.154.0"
2023-01-08T18:37:27.688623399+08:00 stderr F 2023-01-08 10:37:27.688 [INFO][1] ipam_block_reader_writer.go 300: Block must be empty but is not empty, refusing to remove affinity. host="10.1.154.0" subnet="10.65.90.0/26"
2023-01-08T18:37:28.134007174+08:00 stderr F 2023-01-08 10:37:28.133 [ERROR][1] ipam.go 947: Failed to release block affinities for node calicoNode="10.1.154.0" error=block '10.65.90.0/26' is not empty
2023-01-08T18:37:28.134036868+08:00 stderr F 2023-01-08 10:37:28.133 [WARNING][1] ipam.go 885: Error cleaning up node error=block '10.65.90.0/26' is not empty node="10.1.154.0"
2023-01-08T18:38:13.510617799+08:00 stderr F 2023-01-08 10:38:13.510 [INFO][1] ipam.go 983: Calico Node referenced in IPAM data does not exist error=resource does not exist: Node(10.1.154.0) with error: nodes "10.1.154.0" not found
2023-01-08T18:38:13.707898733+08:00 stderr F 2023-01-08 10:38:13.707 [WARNING][1] ipam_allocation.go 137: Confirmed IP leak handle="k8s-pod-network.4199ba87362fa09e39ab6ccc20503d68c432a4a31391f868adcee1b34e5cbed0" ip="10.65.90.3" node="10.1.154.0" pod="logstack/filebeat-filebeat-9t9ww"
2023-01-08T18:38:13.707933402+08:00 stderr F 2023-01-08 10:38:13.707 [WARNING][1] ipam_allocation.go 137: Confirmed IP leak handle="vxlan-tunnel-addr-10.1.154.0" ip="10.65.90.2" node="10.1.154.0"
2023-01-08T18:39:35.110817801+08:00 stderr F 2023-01-08 10:39:35.110 [INFO][1] ipam.go 919: Garbage collecting leaked IP address, and any IPs with its handle handle="vxlan-tunnel-addr-10.1.154.0" ip="10.65.90.2" node="10.1.154.0"
2023-01-08T18:39:35.307488358+08:00 stderr F 2023-01-08 10:39:35.307 [INFO][1] ipam.go 919: Garbage collecting leaked IP address, and any IPs with its handle handle="k8s-pod-network.4199ba87362fa09e39ab6ccc20503d68c432a4a31391f868adcee1b34e5cbed0" ip="10.65.90.3" node="10.1.154.0" pod="logstack/filebeat-filebeat-9t9ww"
2023-01-08T18:39:37.391299344+08:00 stderr F 2023-01-08 10:39:37.391 [INFO][1] ipam.go 882: Cleaning up IPAM resources for deleted node node="10.1.154.0"
2023-01-08T18:39:37.528017712+08:00 stderr F 2023-01-08 10:39:37.527 [INFO][1] ipam_block_reader_writer.go 300: Block must be empty but is not empty, refusing to remove affinity. host="10.1.154.0" subnet="10.65.90.0/26"
2023-01-08T18:39:37.658933132+08:00 stderr F 2023-01-08 10:39:37.658 [ERROR][1] ipam.go 947: Failed to release block affinities for node calicoNode="10.1.154.0" error=block '10.65.90.0/26' is not empty
2023-01-08T18:39:37.658957205+08:00 stderr F 2023-01-08 10:39:37.658 [WARNING][1] ipam.go 885: Error cleaning up node error=block '10.65.90.0/26' is not empty node="10.1.154.0"
2023-01-09T03:19:15.310613154+08:00 stderr F 2023-01-08 19:19:15.310 [INFO][1] ipam.go 919: Garbage collecting leaked IP address, and any IPs with its handle handle="k8s-pod-network.7f7e94c885ae6b949e01166e12a2fdf345edea6ebd2bf13cd899d5aec5a12011" ip="10.65.60.131" node="10.2.87.254" pod="logstack/filebeat-filebeat-p4dk2"

we can see the calico-kube-controller collect the ip in the last.

2023-01-08T18:39:35.110817801+08:00 stderr F 2023-01-08 10:39:35.110 [INFO][1] ipam.go 919: Garbage collecting leaked IP address, and any IPs with its handle handle="vxlan-tunnel-addr-10.1.154.0" ip="10.65.90.2" node="10.1.154.0"

our cluser will add and remove same name node(name will be same) in a period of time. eg node 10.1.154.0.

And node 10.1.154.0 caico node log. the latest time it start is about at 2023-01-08 10:39:07.

2023-01-08T18:39:07.767593673+08:00 stdout F 2023-01-08 10:39:07.767 [INFO][9] startup/utils.go 190: Setting NetworkUnavailable to false
2023-01-08T18:39:07.875384684+08:00 stdout F 2023-01-08 10:39:07.875 [INFO][18] tunnel-ip-allocator/allocateip.go 281: Assign a new tunnel address type="vxlanTunnelAddress"
2023-01-08T18:39:07.875417887+08:00 stdout F 2023-01-08 10:39:07.875 [INFO][18] tunnel-ip-allocator/allocateip.go 357: Release any old tunnel addresses IP="" type="vxlanTunnelAddress"
2023-01-08T18:39:07.914326459+08:00 stdout F 2023-01-08 10:39:07.914 [INFO][18] tunnel-ip-allocator/allocateip.go 368: Assign new tunnel address IP="" type="vxlanTunnelAddress"
2023-01-08T18:39:07.914466714+08:00 stdout F 2023-01-08 10:39:07.914 [INFO][18] tunnel-ip-allocator/ipam.go 97: Auto-assign 1 ipv4, 0 ipv6 addrs for host '10.1.154.0'
2023-01-08T18:39:07.914523349+08:00 stdout F 2023-01-08 10:39:07.914 [INFO][18] tunnel-ip-allocator/ipam.go 596: Looking up existing affinities for host handle="vxlan-tunnel-addr-10.1.154.0" host="10.1.154.0"
2023-01-08T18:39:07.920513774+08:00 stdout F 2023-01-08 10:39:07.920 [INFO][18] tunnel-ip-allocator/ipam.go 358: Looking up existing affinities for host host="10.1.154.0"
2023-01-08T18:39:08.071825262+08:00 stdout F 2023-01-08 10:39:08.071 [INFO][18] tunnel-ip-allocator/ipam.go 441: Trying affinity for 10.65.90.0/26 host="10.1.154.0"
2023-01-08T18:39:08.075287269+08:00 stdout F 2023-01-08 10:39:08.075 [INFO][18] tunnel-ip-allocator/ipam.go 145: Attempting to load block cidr=10.65.90.0/26 host="10.1.154.0"
2023-01-08T18:39:08.078792865+08:00 stdout F 2023-01-08 10:39:08.078 [INFO][18] tunnel-ip-allocator/ipam.go 222: Affinity is confirmed and block has been loaded cidr=10.65.90.0/26 host="10.1.154.0"
2023-01-08T18:39:08.078814138+08:00 stdout F 2023-01-08 10:39:08.078 [INFO][18] tunnel-ip-allocator/ipam.go 1087: Attempting to assign 1 addresses from block block=10.65.90.0/26 handle="vxlan-tunnel-addr-10.1.154.0" host="10.1.154.0"
2023-01-08T18:39:08.081677569+08:00 stdout F 2023-01-08 10:39:08.081 [INFO][18] tunnel-ip-allocator/ipam.go 1564: Creating new handle: vxlan-tunnel-addr-10.1.154.0
2023-01-08T18:39:08.086568505+08:00 stdout F 2023-01-08 10:39:08.086 [INFO][18] tunnel-ip-allocator/ipam.go 1110: Writing block in order to claim IPs block=10.65.90.0/26 handle="vxlan-tunnel-addr-10.1.154.0" host="10.1.154.0"
2023-01-08T18:39:08.092263767+08:00 stdout F 2023-01-08 10:39:08.092 [INFO][18] tunnel-ip-allocator/ipam.go 1123: Successfully claimed IPs: [10.65.90.4/26] block=10.65.90.0/26 handle="vxlan-tunnel-addr-10.1.154.0" host="10.1.154.0"
2023-01-08T18:39:08.092277312+08:00 stdout F 2023-01-08 10:39:08.092 [INFO][18] tunnel-ip-allocator/ipam.go 777: Auto-assigned 1 out of 1 IPv4s: [10.65.90.4/26] handle="vxlan-tunnel-addr-10.1.154.0" host="10.1.154.0"
2023-01-08T18:39:08.106923972+08:00 stdout F 2023-01-08 10:39:08.106 [INFO][18] tunnel-ip-allocator/allocateip.go 455: Assigned tunnel address to node IP="10.65.90.4" type="vxlanTunnelAddress"
2023-01-08T18:39:08.116932772+08:00 stdout F CALICO_NETWORKING_BACKEND is vxlan - no need to run a BGP daemon
2023-01-08T18:39:08.128877097+08:00 stdout F Calico node started successfully
2023-01-08T18:39:09.368362017+08:00 stdout F 2023-01-08 10:39:09.367 [INFO][48] felix/logutils.go 82: Early screen log level set to info
2023-01-08T18:39:09.381234029+08:00 stdout F 2023-01-08 10:39:09.367 [INFO][48] felix/daemon.go 144: Felix starting up GOMAXPROCS=4 builddate="2022-02-03T20:04:10+0000" gitcommit="449e8195a47d032af49946708b1a240108ca84df" version="v3.20.4"
2023-01-08T18:39:09.381252494+08:00 stdout F 2023-01-08 10:39:09.368 [INFO][48] felix/daemon.go 163: Kubernetes server override env vars. KUBERNETES_SERVICE_HOST="10.68.0.1" KUBERNETES_SERVICE_PORT="443"
2023-01-08T18:39:09.381284376+08:00 stdout F 2023-01-08 10:39:09.369 [INFO][48] felix/daemon.go 168: Loading configuration...

AND I also see a strange appearance. in the head the log before: image

we can see calico-kube-controller confirmed the ip, But refuse to collect it. But the code logic seems difficult to deduce

image

image

image

image

image

/cc @caseydavenport

caseydavenport commented 1 year ago

@JornShen what version of Calico are you running on this cluster?

JornShen commented 1 year ago

@JornShen what version of Calico are you running on this cluster?

@caseydavenport we use 3.20.4 version

caseydavenport commented 1 year ago

We made quite a few IPAM garbage collector improvements in Calico v3.23, including this one which might address your problem: https://github.com/projectcalico/calico/pull/5613

tomastigera commented 1 year ago

@JornShen have upgrading to newer calico fixed your problem? Closing for now, we can reopen if that issue persists.