projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.02k stars 1.34k forks source link

calico assigns pod same ip as tunl interface #3589

Closed juliantaylor closed 2 years ago

juliantaylor commented 4 years ago

Current Behavior

on nodes with many pods calico assigns pods the same ip as the tunl interface resulting in a pod that does not work as its probes end up being refused at the interface.

ip a
13: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 100.66.141.153/32 brd 100.66.141.153 scope global tunl0
       valid_lft forever preferred_lft forever

pod:
namespace    podname 6998749cf7-lj6jt       0/1     CrashLoopBackOff   287        17h     100.66.141.153 hostname   <none>           <none>

ipamblocks of the host:
100-66-141-128-26   21d
100-66-141-192-26   21d

The nodes are reinstalled and deleted from the apiserver regularly.

The nodes on which this happen have > 90 pods on them so the default ipamblock of /26 is not enough and it allocates two blocks for the node. Though we are not 100% sure that this is related, it just happend on those types of nodes several times.

The calico version is 3.14.0 and has been upgraded since calico about 2.6, maybe related to something missing in the upgrade fix in https://github.com/projectcalico/node/pull/430? The issue also occurred in 3.13.3

Steps to Reproduce (for bugs)

probably have a node with more pods on it tha fit into the default /26 blocksize

Your Environment

calico

caseydavenport commented 4 years ago

@juliantaylor can you please provide a little bit more information as to how you are installing Calico?

What manifest are you using? How are you installing k8s itself?

juliantaylor commented 4 years ago

we are using the charts from this repository in ./_includes/charts/calico/templates and are using the kubernetes datastore

The kubernetes cluster is version 1.17.3 on bare metall, self managed.

typha is enabled

the ippool spec is:

  spec:
    cidr: 100.64.0.0/13
    ipip:
      enabled: true
      mode: always
    nat-outgoing: true

The cluster is ipv4 only, ipv6 is disabled in calico (FELIX_IPV6SUPPORT=false)

We have in the meantime increased the blocksize to /24 to see if the problem disappears when each host only gets one block (of course with a larger block it may also just become less likely).

caseydavenport commented 4 years ago

Tangential, but please don't use those charts directly. They aren't intended to be used as helm charts directly by users, and are just a part of our documentation templating engine. We don't design, test, or support using those charts directly via helm, so it's "at your own risk"

Hard to see how that would result in the behavior you're seeing, but it could theoretically be due to a misconfiguration in the chart. So, you should install using one of the manifests hosted at docs.projectcalico.org.

As for actually diagnosing this issue, I think I will need:

juliantaylor commented 4 years ago

I forgot to mention we are also using istio and istio-cni, though the affected pod was not istio enabled)

the tunl ip has no handle object on some machines in an affected clusters:

# tunnel handles
kubectl get ipamhandles.crd.projectcalico.org -o yaml | grep name: | grep ipip-tunnel -c
7
# number of machines
$ kubectl get nodes --no-headers | wc -l
10

calico node logs on a node with an affected pod, it does say it created a handle but it does not exist:

2020-06-02 12:26:32.675 [INFO][10] startup/startup.go 299: Early log level set to info
2020-06-02 12:26:32.675 [INFO][10] startup/startup.go 315: Using NODENAME environment for node name
2020-06-02 12:26:32.675 [INFO][10] startup/startup.go 327: Determined node name: kworker-fe-intg-iz2-bap002
2020-06-02 12:26:32.677 [INFO][10] startup/startup.go 359: Checking datastore connection
2020-06-02 12:27:02.677 [INFO][10] startup/startup.go 374: Hit error connecting to datastore - retry error=Get https://100.72.0.1:443/api/v1/nodes/foo: dial tcp 100.72.0.1:443: i/o timeout
2020-06-02 12:27:03.686 [INFO][10] startup/startup.go 383: Datastore connection verified
2020-06-02 12:27:03.687 [INFO][10] startup/startup.go 104: Datastore is ready
2020-06-02 12:27:03.702 [INFO][10] startup/startup.go 425: Initialize BGP data
2020-06-02 12:27:03.702 [INFO][10] startup/startup.go 664: Using autodetected IPv4 address on interface eno1: 10.238.193.3/24
2020-06-02 12:27:03.702 [INFO][10] startup/startup.go 495: Node IPv4 changed, will check for conflicts
2020-06-02 12:27:03.710 [INFO][10] startup/startup.go 727: No AS number configured on node resource, using global value
2020-06-02 12:27:03.710 [INFO][10] startup/startup.go 201: Setting NetworkUnavailable to False
2020-06-02 12:27:03.724 [INFO][10] startup/startup.go 610: FELIX_IPV6SUPPORT is false through environment variable
2020-06-02 12:27:03.731 [INFO][10] startup/startup.go 234: Using node name: kworker-fe-intg-iz2-bap002
2020-06-02 12:27:03.773 [INFO][134] tunnel-ip-allocator/allocateip.go 109: Assign a new tunnel address type="ipipTunnelAddress"
2020-06-02 12:27:03.773 [INFO][134] tunnel-ip-allocator/allocateip.go 185: Release any old tunnel addresses IP="" type="ipipTunnelAddress"
2020-06-02 12:27:03.774 [INFO][134] tunnel-ip-allocator/ipam.go 1172: Releasing all IPs with handle 'ipip-tunnel-addr-kworker-fe-intg-iz2-bap002'
2020-06-02 12:27:03.802 [INFO][134] tunnel-ip-allocator/allocateip.go 196: Assign new tunnel address IP="" type="ipipTunnelAddress"
2020-06-02 12:27:03.802 [INFO][134] tunnel-ip-allocator/ipam.go 87: Auto-assign 1 ipv4, 0 ipv6 addrs for host 'kworker-fe-intg-iz2-bap002'
2020-06-02 12:27:03.806 [INFO][134] tunnel-ip-allocator/ipam.go 313: Looking up existing affinities for host handle="ipip-tunnel-addr-kworker-fe-intg-iz2-bap002" host="kworker-fe-intg-iz2-bap002"
2020-06-02 12:27:03.810 [INFO][134] tunnel-ip-allocator/ipam.go 381: Trying affinity for 100.65.42.0/26 handle="ipip-tunnel-addr-kworker-fe-intg-iz2-bap002" host="kworker-fe-intg-iz2-bap002"
2020-06-02 12:27:03.813 [INFO][134] tunnel-ip-allocator/ipam.go 135: Attempting to load block cidr=100.65.42.0/26 host="kworker-fe-intg-iz2-bap002"
2020-06-02 12:27:03.816 [INFO][134] tunnel-ip-allocator/ipam.go 212: Affinity is confirmed and block has been loaded cidr=100.65.42.0/26 host="kworker-fe-intg-iz2-bap002"
2020-06-02 12:27:03.816 [INFO][134] tunnel-ip-allocator/ipam.go 810: Attempting to assign 1 addresses from block block=100.65.42.0/26 handle="ipip-tunnel-addr-kworker-fe-intg-iz2-bap002" host="kworker-fe-intg-iz2-bap002"
2020-06-02 12:27:03.818 [INFO][134] tunnel-ip-allocator/ipam.go 1271: Creating new handle: ipip-tunnel-addr-kworker-fe-intg-iz2-bap002
2020-06-02 12:27:03.821 [INFO][134] tunnel-ip-allocator/ipam.go 833: Writing block in order to claim IPs block=100.65.42.0/26 handle="ipip-tunnel-addr-kworker-fe-intg-iz2-bap002" host="kworker-fe-intg-iz2-bap002"
2020-06-02 12:27:03.825 [INFO][134] tunnel-ip-allocator/ipam.go 846: Successfully claimed IPs: [100.65.42.42/26] block=100.65.42.0/26 handle="ipip-tunnel-addr-kworker-fe-intg-iz2-bap002" host="kworker-fe-intg-iz2-bap002"
2020-06-02 12:27:03.825 [INFO][134] tunnel-ip-allocator/ipam.go 413: Block '100.65.42.0/26' provided addresses: [100.65.42.42/26] handle="ipip-tunnel-addr-kworker-fe-intg-iz2-bap002" host="kworker-fe-intg-iz2-bap002"
2020-06-02 12:27:03.828 [INFO][134] tunnel-ip-allocator/ipam.go 569: Auto-assigned 1 out of 1 IPv4s: [100.65.42.42/26] handle="ipip-tunnel-addr-kworker-fe-intg-iz2-bap002" host="kworker-fe-intg-iz2-bap002"
2020-06-02 12:27:03.836 [INFO][134] tunnel-ip-allocator/allocateip.go 281: Assigned tunnel address to node IP="100.65.42.42" type="ipipTunnelAddress"
Calico node started successfully
bird: Unable to open configuration file /etc/calico/confd/config/bird6.cfg: No such file or directory
bird: Unable to open configuration file /etc/calico/confd/config/bird.cfg: No such file or directory
2020-06-02 12:27:04.893 [INFO][172] confd/config.go 103: Skipping confd config file.
2020-06-02 12:27:04.893 [INFO][172] confd/config.go 181: Found FELIX_TYPHAK8SSERVICENAME=calico-typha
2020-06-02 12:27:04.893 [INFO][172] confd/run.go 17: Starting calico-confd
2020-06-02 12:27:04.906 [INFO][172] confd/client.go 247: Found Typha ClusterIP. clusterIP="100.72.174.39"
2020-06-02 12:27:04.906 [INFO][172] confd/client.go 254: Found Typha service port. port=v1.ServicePort{Name:"calico-typha", Protocol:"TCP", Port:5473, TargetPort:intstr.IntOrString{Type:1, IntVal:0, StrVal:"calico-typha"}, NodePort:0}
2020-06-02 12:27:04.906 [INFO][172] confd/client.go 145: Connecting to Typha. addr="100.72.174.39:5473"
2020-06-02 12:27:04.906 [INFO][172] confd/sync_client.go 71:  requiringTLS=false
2020-06-02 12:27:04.906 [INFO][172] confd/sync_client.go 200: Starting Typha client
2020-06-02 12:27:04.906 [INFO][172] confd/sync_client.go 71:  requiringTLS=false
2020-06-02 12:27:04.906 [INFO][172] confd/sync_client.go 251: Connecting to Typha. address="100.72.174.39:5473" connID=0x0 type="bgp"
2020-06-02 12:27:04.906 [INFO][172] confd/sync_client.go 266: Connected to Typha. address="100.72.174.39:5473" connID=0x0 type="bgp"
2020-06-02 12:27:04.906 [INFO][172] confd/client.go 363: RouteGenerator has indicated it is in sync
2020-06-02 12:27:04.906 [INFO][172] confd/sync_client.go 301: Started Typha client main loop address="100.72.174.39:5473" connID=0x0 type="bgp"
2020-06-02 12:27:04.907 [INFO][172] confd/sync_client.go 358: Server hello message received address="100.72.174.39:5473" connID=0x0 serverVersion="v3.14.0" type="bgp"
2020-06-02 12:27:04.932 [INFO][172] confd/client.go 800: Recompute BGP peerings: HostBGPConfig(node=kworker-fe-intg-iz2-bap008; name=ip_addr_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap008; name=ip_addr_v6) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap008; name=network_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap008; name=rr_cluster_id) updated; kworker-fe-intg-iz2-bap008 updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap002; name=ip_addr_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap002; name=ip_addr_v6) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap002; name=network_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap002; name=rr_cluster_id) updated; kworker-fe-intg-iz2-bap002 updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap003; name=ip_addr_v4) updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap003; name=ip_addr_v6) updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap003; name=network_v4) updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap003; name=rr_cluster_id) updated; kmaster-fe-intg-iz2-bap003 updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap001; name=ip_addr_v4) updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap001; name=ip_addr_v6) updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap001; name=network_v4) updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap001; name=rr_cluster_id) updated; kmaster-fe-intg-iz2-bap001 updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap009; name=ip_addr_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap009; name=ip_addr_v6) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap009; name=network_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap009; name=rr_cluster_id) updated; kworker-fe-intg-iz2-bap009 updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap006; name=ip_addr_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap006; name=ip_addr_v6) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap006; name=network_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap006; name=rr_cluster_id) updated; kworker-fe-intg-iz2-bap006 updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap003; name=ip_addr_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap003; name=ip_addr_v6) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap003; name=network_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap003; name=rr_cluster_id) updated; kworker-fe-intg-iz2-bap003 updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap004; name=ip_addr_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap004; name=ip_addr_v6) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap004; name=network_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap004; name=rr_cluster_id) updated; kworker-fe-intg-iz2-bap004 updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap001; name=ip_addr_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap001; name=ip_addr_v6) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap001; name=network_v4) updated; HostBGPConfig(node=kworker-fe-intg-iz2-bap001; name=rr_cluster_id) updated; kworker-fe-intg-iz2-bap001 updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap002; name=ip_addr_v4) updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap002; name=ip_addr_v6) updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap002; name=network_v4) updated; HostBGPConfig(node=kmaster-fe-intg-iz2-bap002; name=rr_cluster_id) updated; kmaster-fe-intg-iz2-bap002 updated; Global AS number changed
2020-06-02 12:27:04.932 [INFO][172] confd/sync_client.go 329: Status update from Typha. address="100.72.174.39:5473" connID=0x0 newStatus=in-sync type="bgp"
2020-06-02 12:27:04.932 [INFO][172] confd/client.go 353: Calico Syncer has indicated it is in sync
2020-06-02 12:27:05.031 [INFO][172] confd/resource.go 264: Target config /etc/calico/confd/config/bird_aggr.cfg has been updated
2020-06-02 12:27:05.031 [INFO][172] confd/resource.go 264: Target config /etc/calico/confd/config/bird6_aggr.cfg has been updated
2020-06-02 12:27:05.031 [INFO][172] confd/resource.go 264: Target config /etc/calico/confd/config/bird6.cfg has been updated
2020-06-02 12:27:05.031 [INFO][172] confd/resource.go 264: Target config /etc/calico/confd/config/bird6_ipam.cfg has been updated
2020-06-02 12:27:05.031 [INFO][172] confd/resource.go 264: Target config /etc/calico/confd/config/bird.cfg has been updated
2020-06-02 12:27:05.031 [INFO][172] confd/resource.go 264: Target config /etc/calico/confd/config/bird_ipam.cfg has been updated
2020-06-02 12:27:05.130 [INFO][172] confd/resource.go 264: Target config /tmp/tunl-ip has been updated
2020-06-02 12:27:05.238 [WARNING][171] felix/health.go 188: Reporter is not ready. name="int_dataplane"
2020-06-02 12:27:05.238 [WARNING][171] felix/health.go 188: Reporter is not ready. name="async_calc_graph"
2020-06-02 12:27:05.329 [WARNING][171] felix/ipip_mgr.go 112: Failed to add IPIP tunnel device error=exit status 1
2020-06-02 12:27:05.329 [WARNING][171] felix/ipip_mgr.go 89: Failed configure IPIP tunnel device, retrying... error=exit status 1
bird: device1: Initializing
bird: direct1: Initializing
bird: device1: Starting
bird: device1: Connected to table master
bird: device1: State changed to feed
bird: direct1: Starting
bird: direct1: Connected to table master
bird: direct1: State changed to feed
bird: Graceful restart started
bird: Graceful restart done
bird: Started
bird: device1: State changed to up
bird: direct1: State changed to up
bird: device1: Initializing
bird: direct1: Initializing
bird: Mesh_10_238_193_5: Initializing
bird: Mesh_10_238_194_5: Initializing
bird: Mesh_10_238_195_3: Initializing
bird: Mesh_10_238_193_4: Initializing
bird: Mesh_10_238_193_2: Initializing
bird: Mesh_10_238_194_3: Initializing
bird: Mesh_10_238_194_4: Initializing
bird: Mesh_10_238_195_4: Initializing
bird: Mesh_10_238_195_5: Initializing
bird: device1: Starting
bird: device1: Connected to table master
bird: device1: State changed to feed
bird: direct1: Starting
bird: direct1: Connected to table master
bird: direct1: State changed to feed
bird: Mesh_10_238_193_5: Starting
bird: Mesh_10_238_193_5: State changed to start
bird: Mesh_10_238_194_5: Starting
bird: Mesh_10_238_194_5: State changed to start
bird: Mesh_10_238_195_3: Starting
bird: Mesh_10_238_195_3: State changed to start
bird: Mesh_10_238_193_4: Starting
bird: Mesh_10_238_193_4: State changed to start
bird: Mesh_10_238_193_2: Starting
bird: Mesh_10_238_193_2: State changed to start
bird: Mesh_10_238_194_3: Starting
bird: Mesh_10_238_194_3: State changed to start
bird: Mesh_10_238_194_4: Starting
bird: Mesh_10_238_194_4: State changed to start
bird: Mesh_10_238_195_4: Starting
bird: Mesh_10_238_195_4: State changed to start
bird: Mesh_10_238_195_5: Starting
bird: Mesh_10_238_195_5: State changed to start
bird: Graceful restart started
bird: Started
bird: device1: State changed to up
bird: direct1: State changed to up
bird: Mesh_10_238_194_3: Connected to table master
bird: Mesh_10_238_194_3: State changed to wait
bird: Mesh_10_238_195_4: Connected to table master
bird: Mesh_10_238_195_4: State changed to wait
bird: Mesh_10_238_193_2: Connected to table master
bird: Mesh_10_238_193_2: State changed to wait
bird: Mesh_10_238_195_3: Connected to table master
bird: Mesh_10_238_195_3: State changed to wait
bird: Mesh_10_238_194_5: Connected to table master
bird: Mesh_10_238_194_5: State changed to wait
bird: Mesh_10_238_193_4: Connected to table master
bird: Mesh_10_238_193_4: State changed to wait
bird: Mesh_10_238_193_5: Connected to table master
bird: Mesh_10_238_193_5: State changed to wait
bird: Mesh_10_238_195_5: Connected to table master
bird: Mesh_10_238_195_5: State changed to wait
bird: Mesh_10_238_194_4: Connected to table master
bird: Mesh_10_238_194_4: State changed to wait
bird: Graceful restart done
bird: Mesh_10_238_193_5: State changed to feed
bird: Mesh_10_238_194_5: State changed to feed
bird: Mesh_10_238_195_3: State changed to feed
bird: Mesh_10_238_193_4: State changed to feed
bird: Mesh_10_238_193_2: State changed to feed
bird: Mesh_10_238_194_3: State changed to feed
bird: Mesh_10_238_194_4: State changed to feed
bird: Mesh_10_238_195_4: State changed to feed
bird: Mesh_10_238_195_5: State changed to feed
bird: Mesh_10_238_193_5: State changed to up
bird: Mesh_10_238_194_5: State changed to up
bird: Mesh_10_238_195_3: State changed to up
bird: Mesh_10_238_193_4: State changed to up
bird: Mesh_10_238_193_2: State changed to up
bird: Mesh_10_238_194_3: State changed to up
bird: Mesh_10_238_194_4: State changed to up
bird: Mesh_10_238_195_4: State changed to up
bird: Mesh_10_238_195_5: State changed to up

no logs during the time the broken pod was created

kubelet logs during pod creation:

2020-06-03 11:48:33.648 [INFO][18722] main.go 202: Found containers [traco] ContainerID="5c4992991a8f3eb288e8dbf3aa558576d44333a648ad29fbd5d19e44d262fc67" InterceptType="iptables" Namespace="namespace-qa" Pod="int-qa-5665777b4f-qwh8b"
June 3rd 2020, 13:48:33.6482020-06-03 11:48:33.648 [INFO][18722] kubernetes.go 60: pod info &Pod{ObjectMeta:{int-qa-5665777b4f-qwh8b int-qa-5665777b4f- namespace-qa /api/v1/namespaces/namespace-qa/pods/int-qa-5665777b4f-qwh8b 3dd8706b-e3be-4b36-b01a-9350e673e9ce 342952310 0 2020-06-03 11:48:32 +0000 UTC <nil> <nil> map[app:traco artifactId:traco groupId:com.unitedinternet.portal.mobile mam_brand:gmxint mam_stage:qa pod-template-hash:5665777b4f release:int-qa version:0.0.1] map[cni.projectcalico.org/podIP:100.65.42.42/32 cni.projectcalico.org/podIPs:100.65.42.42/32 kubernetes.io/limit-ranger:LimitRanger plugin set: ephemeral-storage request for container traco; ephemeral-storage limit for container traco kubernetes.io/psp:restricted prometheus.io/path:/-system/prometheus prometheus.io/port:8081 prometheus.io/scrape:true] [{apps/v1 ReplicaSet int-qa-5665777b4f 17bce235-8a1c-470a-80fd-8dc96232a2cb 0xc0003f19fa 0xc0003f19fb}] []  []},Spec:PodSpec,Status:PodStatus{Phase:Pending,Conditions:[]PodCondition{PodCondition{Type:PodScheduled,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2020-06-03 11:48:32 +0000 UTC,Reason:,Message:,},},Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:Burstable,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},}
June 3rd 2020, 13:48:33.6322020-06-03 11:48:33.632 [INFO][18722] kubernetes.go 49: Set up kubernetes client with kubeconfig /etc/cni/net.d/ZZZ-istio-cni-kubeconfig ContainerID="5c4992991a8f3eb288e8dbf3aa558576d44333a648ad29fbd5d19e44d262fc67" InterceptType="iptables" Namespace="namespace-qa" Pod="int-qa-5665777b4f-qwh8b"
June 3rd 2020, 13:48:33.6322020-06-03 11:48:33.632 [INFO][18722] kubernetes.go 50: Kubernetes config &rest.Config{Host:"https://[100.72.0.1]:443", APIPath:"", ContentConfig:rest.ContentConfig{AcceptContentTypes:"", ContentType:"", GroupVersion:(*schema.GroupVersion)(nil), NegotiatedSerializer:runtime.NegotiatedSerializer(nil)}, Username:"", Password:"", BearerToken:"--- REDACTED ---", BearerTokenFile:"", Impersonate:rest.ImpersonationConfig{UserName:"", Groups:[]string(nil), Extra:map[string][]string(nil)}, AuthProvider:<nil>, AuthConfigPersister:rest.AuthProviderConfigPersister(nil), ExecProvider:<nil>, TLSClientConfig:rest.sanitizedTLSClientConfig{Insecure:false, ServerName:"", CertFile:"", KeyFile:"", CAFile:"", CertData:[]uint8(nil), KeyData:[]uint8(nil), CAData:[]uint8, NextProtos:[]string(nil)}, UserAgent:"", DisableCompression:false, Transport:http.RoundTripper(nil), WrapTransport:(transport.WrapperFunc)(nil), QPS:0, Burst:0, RateLimiter:flowcontrol.RateLimiter(nil), Timeout:0, Dial:(func(context.Context, string, string) (net.Conn, error))(nil)} ContainerID="5c4992991a8f3eb288e8dbf3aa558576d44333a648ad29fbd5d19e44d262fc67" InterceptType="iptables" Namespace="namespace-qa" Pod="int-qa-5665777b4f-qwh8b"
June 3rd 2020, 13:48:33.6312020-06-03 11:48:33.631 [INFO][18722] main.go 158: Loaded k8s arguments: {{true} <nil> int-qa-5665777b4f-qwh8b namespace-qa 5c4992991a8f3eb288e8dbf3aa558576d44333a648ad29fbd5d19e44d262fc67}
June 3rd 2020, 13:48:33.6312020-06-03 11:48:33.631 [INFO][18722] main.go 150: cmdAdd config parsed prevResult=Interfaces:[{Name:calia680b268f1d Mac: Sandbox:}], IP:[{Version:4 Interface:<nil> Address:{IP:100.65.42.42 Mask:ffffffff} Gateway:<nil>}], DNS:{Nameservers:[] Domain: Search:[] Options:[]} version="0.3.1"
June 3rd 2020, 13:48:33.6312020-06-03 11:48:33.631 [INFO][18722] main.go 157: Getting identifiers with arguments: IgnoreUnknown=1;K8S_POD_NAMESPACE=namespace-qa;K8S_POD_NAME=int-qa-5665777b4f-qwh8b;K8S_POD_INFRA_CONTAINER_ID=5c4992991a8f3eb288e8dbf3aa558576d44333a648ad29fbd5d19e44d262fc67
June 3rd 2020, 13:48:33.6302020-06-03 11:48:33.630 [INFO][18722] main.go 135: istio-cni cmdAdd parsing config

ipam handle of the pod:

apiVersion: crd.projectcalico.org/v1
kind: IPAMHandle
metadata:
  annotations:
    projectcalico.org/metadata: '{"creationTimestamp":null}'
  creationTimestamp: "2020-06-03T11:48:33Z"
  generation: 1
  name: k8s-pod-network.5c4992991a8f3eb288e8dbf3aa558576d44333a648ad29fbd5d19e44d262fc67
  resourceVersion: "342952305"
  selfLink: /apis/crd.projectcalico.org/v1/ipamhandles/k8s-pod-network.5c4992991a8f3eb288e8dbf3aa558576d44333a648ad29fbd5d19e44d262fc67
  uid: d21d33d7-3cc9-4ea2-a582-d5d7fd255878
spec:
  block:
    100.65.42.0/26: 1
  handleID: k8s-pod-network.5c4992991a8f3eb288e8dbf3aa558576d44333a648ad29fbd5d19e44d262fc67

ipam block of pod

apiVersion: crd.projectcalico.org/v1
kind: IPAMBlock
metadata:
  annotations:
    projectcalico.org/metadata: '{"creationTimestamp":null}'
  creationTimestamp: "2020-05-26T12:43:53Z"
  generation: 465
  name: 100-65-42-0-26
  resourceVersion: "343319567"
  selfLink: /apis/crd.projectcalico.org/v1/ipamblocks/100-65-42-0-26
  uid: 9f913616-4240-4fa8-a956-bf97828eb6dc
spec:
  affinity: host:kworker-fe-intg-iz2-bap002
  allocations:
  - 42
  - 57
  - 43
  - 44
  - 45
  - 46
  - 47
  - 48
  - 49
  - null
  - 50
  - null
  - 51
  - 52
  - 53
  - 54
  - 7
  - 20
  - 11
  - 29
  - 30
  - 55
  - 24
  - 56
  - 23
  - 5
  - 2
  - 59
  - 1
  - 34
  - 13
  - 32
  - 9
  - 14
  - 16
  - 33
  - 39
  - 22
  - 19
  - 35
  - 38
  - 37
  - 58
  - 28
  - 3
  - 6
  - 0
  - 8
  - 41
  - 36
  - 26
  - 21
  - 15
  - 12
  - 27
  - 18
  - 60
  - 4
  - 17
  - 10
  - 61
  - 31
  - 40
  - 25
  attributes:
# 62 handles, only pods no tunl ip
  cidr: 100.65.42.0/26
  deleted: false
  strictAffinity: false
  unallocated:
  - 9
  - 11

we have not yet managed to reproduce it easily but we are trying to create a testcase.

juliantaylor commented 4 years ago

we checked our 15 clusters, we only have a divergence in tunnel interface handles to number of nodes in the clusters which have more than one ipamblock per node

draining and reinstalling one of the nodes with a missing tunnel ipamhandle did create the tunnel handle the node now has 3 ipamblocks, two of which have zero allocations

(note we reinstall nodes weekly, so this is not an issue of outdated nodes)

the only difference in the logs of calico-node is following two extra lines in the startup before the reinstallation which appear to just be health check timing related:

2020-06-02 12:27:05.238 [WARNING][171] felix/health.go 188: Reporter is not ready. name="int_dataplane"
2020-06-02 12:27:05.238 [WARNING][171] felix/health.go 188: Reporter is not ready. name="async_calc_graph"
juliantaylor commented 4 years ago

This might be a race between the controller cleaning up ipamhandles and calico-node creating them:

controller logs

2020-06-02 12:14:06.328 [INFO][1] ipam.go 281: Calico Node referenced in IPAM data does not exist error=resource does not exist: Node(kworker-fe-intg-iz2-bap002) with error: nodes "kworker-fe-intg-iz2-bap002" not found
2020-06-02 12:14:06.328 [INFO][1] ipam.go 137: Checking node calicoNode="kworker-fe-intg-iz2-bap002" k8sNode=""
2020-06-02 12:30:46.328 [INFO][1] ipam.go 177: Cleaning up IPAM resources for deleted node calicoNode="kworker-fe-intg-iz2-bap002" k8sNode=""
2020-06-02 12:30:46.328 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'ipip-tunnel-addr-kworker-fe-intg-iz2-bap002'
2020-06-02 12:30:46.352 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.39bfceebef2de92cc0f491d08df0a64c47640f8e64e6c278975948328f24a04c'
2020-06-02 12:30:46.375 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.1c1e961eb7f00426a455ec29d14bab995fd7ca2edfcdcf28b5c94c2cb1e01147'
2020-06-02 12:30:46.398 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.551f79f0f338ec9b9fda2fc167ca189bb24465c72c858f5b2bca8c9dab9c82da'
2020-06-02 12:30:46.428 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.d44426f4b374b44c93bd0e88030b47463a01c076c775664f923b08d82f2c94da'
2020-06-02 12:30:46.452 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.117ae9f5fdf3cd40ff6f8aa968b8b62b3aa7a67590d05066e7c0cbd03fb70319'
2020-06-02 12:30:46.476 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.b2ed9ffa87f2aa9bbcd2ab09a24a8eb146f2d137371fab70cec8a5e1cfb1771e'
2020-06-02 12:30:46.499 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a5d9688765c97f1869f561e8c11a5f04d5c31b4c6d7b16482e091f5d662af319'
2020-06-02 12:30:46.522 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.28f4bd436b8fd310351de61a16952618f63e4b4b478a4be5acd76e16df0ca815'
2020-06-02 12:30:46.545 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.9a7903058e16026e93c40f5ac568ef8b2f4778a03472ecde785eeb20eed7981a'
2020-06-02 12:30:46.567 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.b0f5cac2a9a655d1ef0f70f66c9351c2af9abbc5ea592aef4353ebfa85fc2d8c'
2020-06-02 12:30:46.589 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.e383ee91889e6abaf2b7a7b2c77405877c336ca5a8e121acffee5a0621e6be4c'
2020-06-02 12:30:46.612 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.90fd1abca130c4370eadbd8d9cbdf69da29c3cffb5aab249ccfc0cc21a00c1c8'
2020-06-02 12:30:47.131 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.115b714b7c11487a6c043a65b4044159e414a2066ad82b787935ce0aa52a5f62'
2020-06-02 12:30:48.730 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.1efd58795f4212d385ea0dd1526d4f947c8dee6ffc4ecda47a37ef1b81d48818'
2020-06-02 12:30:50.330 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.782d1b2e3c72b6d2ecf9f14148451e22f76ad4da617ca25f13878ce6aba4365c'
2020-06-02 12:30:52.130 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.d40daa67f5a608464c644b0252c5dea4250e823215e70689d191a09be5dbb922'
2020-06-02 12:30:53.730 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.0816a4b1540689aca730734d55041936c408a3bb2d95bde5a2b547520bbdd937'
2020-06-02 12:30:56.131 [INFO][1] ipam_block_reader_writer.go 292: Block must be empty but is not empty, refusing to remove affinity. host="kworker-fe-intg-iz2-bap002" subnet="100.65.42.0/26"
2020-06-02 12:30:56.131 [ERROR][1] ipam.go 968: Error releasing affinity cidr=100.65.42.0/26 error=block '100.65.42.0/26' is not empty
2020-06-02 12:30:56.131 [ERROR][1] ipam.go 223: Failed to release block affinities for node calicoNode="kworker-fe-intg-iz2-bap002" error=block '100.65.42.0/26' is not empty
2020-06-02 12:30:56.131 [WARNING][1] ipam.go 180: Error cleaning up node calicoNode="kworker-fe-intg-iz2-bap002" error=block '100.65.42.0/26' is not empty k8sNode=""

the node was deleted from the api at 12:14:06 and returned in the cluster at 12:26:40 the node created its handle and the controller deleted it again after:

audit logs of the tunnel ipamhandle

June 2nd 2020, 12:30:46.344 delete system:serviceaccount:kube-system:calico-kube-controllers
June 2nd 2020, 12:30:46.344 delete system:serviceaccount:kube-system:calico-kube-controllers
June 2nd 2020, 12:30:46.342 get system:serviceaccount:kube-system:calico-kube-controllers
June 2nd 2020, 12:30:46.342 get system:serviceaccount:kube-system:calico-kube-controllers
June 2nd 2020, 12:30:46.339 update system:serviceaccount:kube-system:calico-kube-controllers
June 2nd 2020, 12:30:46.339 update system:serviceaccount:kube-system:calico-kube-controllers
June 2nd 2020, 12:30:46.338 get system:serviceaccount:kube-system:calico-kube-controllers
June 2nd 2020, 12:30:46.338 get system:serviceaccount:kube-system:calico-kube-controllers
June 2nd 2020, 12:30:46.328 get system:serviceaccount:kube-system:calico-kube-controllers
June 2nd 2020, 12:30:46.328 get system:serviceaccount:kube-system:calico-kube-controllers
June 2nd 2020, 12:27:03.819 create system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.817 get system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.817 get system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.795 delete system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.795 delete system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.793 get system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.793 get system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.788 update system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.788 update system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.785 get system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.785 get system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.775 get system:serviceaccount:kube-system:calico-node
June 2nd 2020, 12:27:03.775 get system:serviceaccount:kube-system:calico-node
caseydavenport commented 4 years ago

@juliantaylor wow, awesome sleuthing job. I am reading through this now, but I think this should give me a good starting place to look for the bug. Thanks for providing all of that information.

caseydavenport commented 4 years ago

2020-06-02 12:14:06.328 [INFO][1] ipam.go 281: Calico Node referenced in IPAM data does not exist error=resource does not exist: Node(kworker-fe-intg-iz2-bap002) with error: nodes "kworker-fe-intg-iz2-bap002" not found

Looks like the controller spots that the node has been deleted at 14:06, like you say. However, it doesn't take any action just yet.

2020-06-02 12:27:03.818 [INFO][134] tunnel-ip-allocator/ipam.go 1271: Creating new handle: ipip-tunnel-addr-kworker-fe-intg-iz2-bap002

13 minutes later, calico/node starts up again and creates a new allocation + handle for the tunnel address on that node.

2020-06-02 12:30:46.328 [INFO][1] ipam.go 177: Cleaning up IPAM resources for deleted node calicoNode="kworker-fe-intg-iz2-bap002" k8sNode=""

The controller finally decides to clean up the resources for the node it determined was deleted 15 minutes ago. However at this point, the node has re-joined the cluster and its resources are still valid.

My best guess for why it's taking so long is that we're iterating this loop a lot and getting rate-limited on pod API queries, slowing us down: https://github.com/projectcalico/kube-controllers/blob/master/pkg/controllers/node/ipam.go#L144-L167

Not sure why else it would take so long. I'll see if I can reproduce.

I suspect to fix the race condition we will need to actually pull the handle object out, including revision information, and then write the handle when we go to release the IPs to make sure it hasn't been modified in between, and to invalidate any other clients that might be operating on the handle as well.

juliantaylor commented 4 years ago

Yes checking the revision and aborting/rechecking if it is different will probably properly fix this.

our nodes are configured to allow 200 pods per node and our apiserver rate limiting is default setting (don't recall right now what that is), this shouldn't take 15 minutes to check.

Though the podExistsOnNode function seems very inefficient, if I interpret it correctly it gets all pods in the namespace and checks its node name attribute. It should be more efficient to just get all pods with a field selector==node name and reuse that information for the whole loop

I can probably check next week with debug logging enabled what is taking so long.

juliantaylor commented 4 years ago

though it only gets one pod the cost of the check should be ok. Maybe it is calicos rate limiter? we do delete many nodes in parallel during updates which could cause high limiting. Adding debug log how long it is limited might be good.

juliantaylor commented 4 years ago

I checked the logs of the last redeployment again, it is likely the rate limiting in calico a node wants to get cleaned up and fails and gets rechecked with increasing rate limiting:

this loops a while.

2020-06-16 11:14:51.268 [INFO][1] ipam.go 137: Checking node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:14:51.588 [INFO][1] ipam.go 177: Cleaning up IPAM resources for deleted node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:14:51.589 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb'
2020-06-16 11:14:51.591 [INFO][1] ipam.go 217: Couldn't release all IPs for stale node, schedule retry calicoNode="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:51.591 [WARNING][1] ipam.go 180: Error cleaning up node calicoNode="kworker-fe-intg-iz2-bap003" error=Couldn't release all IPs k8sNode=""
2020-06-16 11:14:52.266 [INFO][1] ipam.go 45: Synchronizing IPAM data

the time between these messages increases over the loops up to 5 minutes in this case:

2020-06-16 11:20:25.110 [INFO][1] ipam.go 137: Checking node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:25:52.790 [INFO][1] ipam.go 177: Cleaning up IPAM resources for deleted node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
caseydavenport commented 4 years ago

Ok, great. That's good information.

Maybe it is calicos rate limiter? we do delete many nodes in parallel during updates which could cause high limiting.

Yes, there is a default client-side ratelimiting that is done for Kubernetes client requests, and Calico introduces another ratelimiting mechanism in kube-controllers (but I think only for errors).

Though the podExistsOnNode function seems very inefficient

Yeah, this is where we would hit the client-side ratelimiting I was thinking about, which is 5req/second. At 200 pods/node, it would take potentially 40s to handle each node.

However, the logs you're linking above also seem to suggest we might be hitting Calico's error retry ratelimiting.

It's interesting that we're seeing this log:

2020-06-16 11:14:51.591 [WARNING][1] ipam.go 180: Error cleaning up node calicoNode="kworker-fe-intg-iz2-bap003" error=Couldn't release all IPs k8sNode=""

However, we're not seeing any other warnings or errors indicating that it is having trouble releasing IP addresses.

It makes me think you're probably hitting this, which isn't actually an error per-se, but we schedule a retry anyway: https://github.com/projectcalico/kube-controllers/blob/master/pkg/controllers/node/ipam.go#L206

I think we do a retry defensively there, since that line means "the allocation has changed between when we queried it and when we tried to free it", but it might be over aggressive.

juliantaylor commented 4 years ago

here is the full log of this node deletion: note the handle k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb it loops on, it is the only handle on the node where the following log line is not Node doesn't exist,...

the handle does not appear in the log after this anymore and is currently not in the cluster. the loop does stop around the time it takes the node to rejoin the cluster but I still have to verify this in other logs.

2020-06-16 10:54:09.658 [INFO][1] ipam.go 190: Node and IPAM data is in sync
2020-06-16 11:14:39.649 [INFO][1] ipam.go 45: Synchronizing IPAM data
2020-06-16 11:14:39.683 [INFO][1] ipam.go 281: Calico Node referenced in IPAM data does not exist error=resource does not exist: Node(kworker-fe-intg-iz2-bap003) with error: nodes "kworker-fe-intg-iz2-bap003" not found
2020-06-16 11:14:39.683 [INFO][1] ipam.go 137: Checking node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:14:39.688 [INFO][1] ipam.go 177: Cleaning up IPAM resources for deleted node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:14:39.688 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.48984c294bd52f78da8473d27804a72515f27eadc8e9d45c53e3837c0f1ae4e5'
2020-06-16 11:14:39.712 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.712 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.b140fb83c8a7d4e0479727d06a69a1708aec0cd5ab7b2560d406aa263c5a81eb'
2020-06-16 11:14:39.733 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.733 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a1112a6b2df952765fa60754841ef6fd43e282847e33342c270bbf19ae54406b'
2020-06-16 11:14:39.754 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.754 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.8dc72db74db6878642bf316ef1f759b5f1037a0d7482264d0a373b41976b4e99'
2020-06-16 11:14:39.774 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.774 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.fc9fbf27f7d34f29103b099f90550b1a0315dff9bb36f6a97074ecd7716ad45c'
2020-06-16 11:14:39.794 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.794 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.44eb60b207a27a31828288d06a911ce083b7fff066caed4a37c674d64c377570'
2020-06-16 11:14:39.817 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.817 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.c659189452281742eeae2187d6b72a17bfd1af0eda61fde603188c03731c39ad'
2020-06-16 11:14:39.839 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.839 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb'
2020-06-16 11:14:39.840 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.300fd73a898920666f54417e463b836719a1e6a419da9740ec93f6bc54885f50'
2020-06-16 11:14:39.862 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.862 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.8cf97d97311cb7197dbdfa7ba89a1c4f41a4cd3b627772b808005e002da73b95'
2020-06-16 11:14:39.883 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.883 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.59f309f37f47512a45163af256c7294c11c58b9c20527696d55c6e3676a6fa51'
2020-06-16 11:14:39.904 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.904 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'ipip-tunnel-addr-kworker-fe-intg-iz2-bap003'
2020-06-16 11:14:39.925 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.925 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.4140b489a0ba69f081285ab28548a9be27dcff3307a8f8cf252c56604228fdff'
2020-06-16 11:14:39.947 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.947 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.f74d11294128aeffacc2a6dd3b1b7104a4b292a03bb639d907fa234d3540950e'
2020-06-16 11:14:39.969 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.969 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.2e57e9ec08b25cb6786190925521d90da543f6901c56c11aa386f88b9090fc3b'
2020-06-16 11:14:39.990 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:39.990 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.20bd91e571cbb6e0d3fa4be2a72ba674207ec020a549007b5db8a1d4af6b2119'
2020-06-16 11:14:41.061 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:41.061 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.eb016d9ced15e062bc79d68b31fd686ada9a994faa09d3a1c51aa7dec2b74c58'
2020-06-16 11:14:42.463 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=100.71.132.192/26 host="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:42.463 [INFO][1] ipam.go 217: Couldn't release all IPs for stale node, schedule retry calicoNode="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:42.463 [WARNING][1] ipam.go 180: Error cleaning up node calicoNode="kworker-fe-intg-iz2-bap003" error=Couldn't release all IPs k8sNode=""
2020-06-16 11:14:42.477 [INFO][1] ipam.go 45: Synchronizing IPAM data
2020-06-16 11:14:42.878 [INFO][1] ipam.go 281: Calico Node referenced in IPAM data does not exist error=resource does not exist: Node(kworker-fe-intg-iz2-bap003) with error: nodes "kworker-fe-intg-iz2-bap003" not found
2020-06-16 11:14:42.879 [INFO][1] ipam.go 137: Checking node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:14:42.889 [INFO][1] ipam.go 177: Cleaning up IPAM resources for deleted node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:14:42.889 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb'
2020-06-16 11:14:43.057 [INFO][1] ipam.go 217: Couldn't release all IPs for stale node, schedule retry calicoNode="kworker-fe-intg-iz2-bap003"
2020-06-16 11:14:43.057 [WARNING][1] ipam.go 180: Error cleaning up node calicoNode="kworker-fe-intg-iz2-bap003" error=Couldn't release all IPs k8sNode=""

<<< lots of the exact same logs, no other node deletions or handles >>>

2020-06-16 11:17:41.236 [INFO][1] ipam.go 137: Checking node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:20:25.076 [INFO][1] ipam.go 177: Cleaning up IPAM resources for deleted node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:20:25.076 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb'
2020-06-16 11:20:25.078 [INFO][1] ipam.go 217: Couldn't release all IPs for stale node, schedule retry calicoNode="kworker-fe-intg-iz2-bap003"
2020-06-16 11:20:25.078 [WARNING][1] ipam.go 180: Error cleaning up node calicoNode="kworker-fe-intg-iz2-bap003" error=Couldn't release all IPs k8sNode=""
2020-06-16 11:20:25.078 [INFO][1] ipam.go 45: Synchronizing IPAM data
2020-06-16 11:20:25.110 [INFO][1] ipam.go 281: Calico Node referenced in IPAM data does not exist error=resource does not exist: Node(kworker-fe-intg-iz2-bap003) with error: nodes "kworker-fe-intg-iz2-bap003" not found
2020-06-16 11:20:25.110 [INFO][1] ipam.go 137: Checking node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:25:52.790 [INFO][1] ipam.go 177: Cleaning up IPAM resources for deleted node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:25:52.790 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb'
2020-06-16 11:25:52.793 [INFO][1] ipam.go 217: Couldn't release all IPs for stale node, schedule retry calicoNode="kworker-fe-intg-iz2-bap003"
2020-06-16 11:25:52.793 [WARNING][1] ipam.go 180: Error cleaning up node calicoNode="kworker-fe-intg-iz2-bap003" error=Couldn't release all IPs k8sNode=""
2020-06-16 11:25:52.806 [INFO][1] ipam.go 45: Synchronizing IPAM data
2020-06-16 11:25:52.841 [INFO][1] ipam.go 281: Calico Node referenced in IPAM data does not exist error=resource does not exist: Node(kworker-fe-intg-iz2-bap003) with error: nodes "kworker-fe-intg-iz2-bap003" not found
2020-06-16 11:25:52.841 [INFO][1] ipam.go 137: Checking node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:26:03.795 [INFO][1] resources.go 371: Terminating main client watcher loop
2020-06-16 11:26:03.803 [INFO][1] resources.go 343: Main client watcher loop
2020-06-16 11:36:48.201 [INFO][1] ipam.go 177: Cleaning up IPAM resources for deleted node calicoNode="kworker-fe-intg-iz2-bap003" k8sNode=""
2020-06-16 11:36:48.201 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb'
2020-06-16 11:36:48.204 [INFO][1] ipam.go 217: Couldn't release all IPs for stale node, schedule retry calicoNode="kworker-fe-intg-iz2-bap003"
2020-06-16 11:36:48.204 [WARNING][1] ipam.go 180: Error cleaning up node calicoNode="kworker-fe-intg-iz2-bap003" error=Couldn't release all IPs k8sNode=""
2020-06-16 11:36:48.213 [INFO][1] ipam.go 45: Synchronizing IPAM data
2020-06-16 11:36:49.007 [INFO][1] ipam.go 190: Node and IPAM data is in sync
juliantaylor commented 4 years ago

We'll set the controller to debug mode for the next update planned tomorrow, hopefully that gives some more clues about the root cause. Though the problem does not always happen (and it has not re-occured in our cluster we switched to /24 ipblocks a few weeks ago yet, so maybe there could still be a relation to more than one block per node)

Can you conveniently (i.e env variable/commandline option) only set the ipam.go to debug logging and leave the rest at info?

juliantaylor commented 4 years ago

it did not reoccur this time with debug logging enabled :(

caseydavenport commented 4 years ago

FYI I have a test which seems to reproduce this scenario in this PR: https://github.com/projectcalico/kube-controllers/pull/489

Next step is to make the test pass :)

juliantaylor commented 4 years ago

The ratelimiting is caused by cleanupNode trying to release an already released ip causing a retry loop which eventually rate limits. The problem appears to be that the retry does not actually refresh the handles properly so it retries on the same non existant handle for a while:

2020-08-11 10:43:58.225 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb'
2020-08-11 10:43:58.225 [DEBUG][1] k8s.go 568: Performing 'Get' for IPAMHandleKey(id=k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) 
2020-08-11 10:43:58.225 [DEBUG][1] customresource.go 208: Get custom Kubernetes resource Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Resource="IPAMHandles" Revision=""
2020-08-11 10:43:58.225 [DEBUG][1] customresource.go 219: Get custom Kubernetes resource by name Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Name="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" Namespace="" Resource="IPAMHandles" Revision=""
2020-08-11 10:43:58.227 [DEBUG][1] customresource.go 228: Error getting resource Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Name="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" Namespace="" Resource="IPAMHandles" Revision="" error=ipamhandles.crd.projectcalico.org "k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" not found
2020-08-11 10:43:58.227 [DEBUG][1] ipam.go 206: IP already released calicoNode="kworker-fe-intg-iz2-bap003" handle="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb"
2020-08-11 10:44:01.263 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb'
2020-08-11 10:44:01.263 [DEBUG][1] k8s.go 568: Performing 'Get' for IPAMHandleKey(id=k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) 
2020-08-11 10:44:01.263 [DEBUG][1] customresource.go 208: Get custom Kubernetes resource Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Resource="IPAMHandles" Revision=""
2020-08-11 10:44:01.263 [DEBUG][1] customresource.go 219: Get custom Kubernetes resource by name Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Name="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" Namespace="" Resource="IPAMHandles" Revision=""
2020-08-11 10:44:01.424 [DEBUG][1] customresource.go 228: Error getting resource Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Name="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" Namespace="" Resource="IPAMHandles" Revision="" error=ipamhandles.crd.projectcalico.org "k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" not found
2020-08-11 10:44:01.424 [DEBUG][1] ipam.go 206: IP already released calicoNode="kworker-fe-intg-iz2-bap003" handle="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb"
2020-08-11 10:44:01.662 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb'
2020-08-11 10:44:01.662 [DEBUG][1] k8s.go 568: Performing 'Get' for IPAMHandleKey(id=k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) 
2020-08-11 10:44:01.662 [DEBUG][1] customresource.go 208: Get custom Kubernetes resource Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Resource="IPAMHandles" Revision=""
2020-08-11 10:44:01.662 [DEBUG][1] customresource.go 219: Get custom Kubernetes resource by name Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Name="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" Namespace="" Resource="IPAMHandles" Revision=""
2020-08-11 10:44:01.824 [DEBUG][1] customresource.go 228: Error getting resource Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Name="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" Namespace="" Resource="IPAMHandles" Revision="" error=ipamhandles.crd.projectcalico.org "k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" not found
2020-08-11 10:44:01.824 [DEBUG][1] ipam.go 206: IP already released calicoNode="kworker-fe-intg-iz2-bap003" handle="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb"
2020-08-11 10:44:03.099 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb'
2020-08-11 10:44:03.099 [DEBUG][1] k8s.go 568: Performing 'Get' for IPAMHandleKey(id=k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) 
2020-08-11 10:44:03.099 [DEBUG][1] customresource.go 208: Get custom Kubernetes resource Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Resource="IPAMHandles" Revision=""
2020-08-11 10:44:03.099 [DEBUG][1] customresource.go 219: Get custom Kubernetes resource by name Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Name="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" Namespace="" Resource="IPAMHandles" Revision=""
2020-08-11 10:44:03.101 [DEBUG][1] customresource.go 228: Error getting resource Key=IPAMHandle(k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb) Name="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" Namespace="" Resource="IPAMHandles" Revision="" error=ipamhandles.crd.projectcalico.org "k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb" not found
2020-08-11 10:44:03.101 [DEBUG][1] ipam.go 206: IP already released calicoNode="kworker-fe-intg-iz2-bap003" handle="k8s-pod-network.a6a66db430a3f6996e9de7119305a01f600e6cf952b3168b888f98bda5d834fb"
...
juliantaylor commented 4 years ago

Any updates on the issue?

caseydavenport commented 3 years ago

@juliantaylor sorry for the delay, I was working on a fix for this (and another similar issue) but it ended up falling through due to issues w/ the Kubernetes API.

I'll be picking this up again once Calico v3.17 is out, which should be soon.

thechristschn commented 3 years ago

Hallo @caseydavenport, do you have any updates? We seem to have the exact same problem in one of our clusters.

juliantaylor commented 3 years ago

Don't know if it is related but calico also seems to randomly leave old blockaffinities lying around when nodes are deleted, we have many month to year old blockaffinities in clusters with all nodes being deleted every week: kworker-be-intg-iz1-022-100-70-178-64-26 84d

But this does not always happen, some get cleaned up, others don't.

Also it is very likely the cause for the controllermanager to get stuck for 10-20 minutes at a time before deleting the ipamhandles is related to nodes having more than one ipamblock per node, the problem only occurs on nodes with more than one ipamblock.

juliantaylor commented 3 years ago

We have increased the ipam blocksize from /26 to /24 (which required manually creating non-overlapping /24 blocks via status: pending blockaffinities as calico will happily overlap new ranges with old ranges when creating blocks of a different size). So now each our nodes only has one block.

This has greatly reduced the frequency of this problem occurring (we only see it occasionally in one of our 16 clusters). So the cause of the controller hang is likely related to the number of blocks. (but the main issue is still that the controller can work on outdated data, even without it hanging this could still by racy)

caseydavenport commented 3 years ago

Sorry for the (very) long delay on this one. There have been a number of improvements going in to the IPAM garbage collection logic in Calico over the past couple of releases that may / may not close the window a bit on this issue as well.

I'm still working on closing this race down completely when I can, but it's slow going. Just wanted to say it hasn't dropped off my radar.

sfudeus commented 3 years ago

Thx @caseydavenport, much appreciated.

drgarcia1986 commented 3 years ago

Seems like we are going through the same problem but we're running our cluster on AWS (setup with kops):

2021-09-08 13:02:13.709 [INFO][1] ipam.go 137: Checking node calicoNode="ip-XXXX.ec2.internal" k8sNode=""
2021-09-08 13:06:32.656 [INFO][1] resources.go 371: Terminating main client watcher loop
2021-09-08 13:06:32.668 [INFO][1] resources.go 343: Main client watcher loop
2021-09-08 13:18:53.709 [INFO][1] ipam.go 177: Cleaning up IPAM resources for deleted node calicoNode="ip-XXXX.ec2.internal" k8sNode=""
2021-09-08 13:18:53.827 [INFO][1] ipam_block_reader_writer.go 292: Block must be empty but is not empty, refusing to remove affinity. host="ip-XXXX.ec2.internal" subnet="172.20.217.64/26"
2021-09-08 13:18:53.828 [ERROR][1] ipam.go 968: Error releasing affinity cidr=172.20.217.64/26 error=block '172.20.217.64/26' is not empty
2021-09-08 13:18:53.828 [ERROR][1] ipam.go 223: Failed to release block affinities for node calicoNode="ip-XXXX.ec2.internal" error=block '172.20.217.64/26' is not empty
2021-09-08 13:18:53.828 [WARNING][1] ipam.go 180: Error cleaning up node calicoNode="ip-XXXX.ec2.internal" error=block '172.20.217.64/26' is not empty k8sNode=""
2021-09-08 13:18:53.880 [INFO][1] ipam.go 281: Calico Node referenced in IPAM data does not exist error=resource does not exist: Node(ip-XXXX.ec2.internal) with error: nodes "ip-XXXX.ec2.internal" not found
2021-09-08 13:18:53.880 [INFO][1] ipam.go 137: Checking node calicoNode="ip-XXXX.ec2.internal" k8sNode=""
2021-09-08 13:35:33.880 [INFO][1] ipam.go 177: Cleaning up IPAM resources for deleted node calicoNode="ip-XXXX.ec2.internal" k8sNode=""
2021-09-08 13:35:33.880 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'ipip-tunnel-addr-ip-XXXX.ec2.internal'
2021-09-08 13:35:33.920 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=172.20.153.192/26 host="ip-XXXX.ec2.internal"
2021-09-08 13:35:33.920 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.652d431974d6c9c6f565a70c87cbde49d4cb3a1631e7185853a756fc14a3d4f8'
2021-09-08 13:35:33.963 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=172.20.153.192/26 host="ip-XXXX.ec2.internal"
2021-09-08 13:35:33.963 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.8ebb4aaf4132e486109b4b1750829a73c91ac5e117b586f1c2c8f4cea5327bfb'
2021-09-08 13:35:33.967 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.220e178fed9915381fc77a209a049480ce93cd8474649a2c5cd935d3f96ce09e'
2021-09-08 13:35:34.011 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=172.20.199.64/26 host="ip-YYYY.ec2.internal"
2021-09-08 13:35:34.012 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.06bc3aad10798eb71246269e2465aa6ffb8b1fdd82b3e213abb628e1ab7acead'
2021-09-08 13:35:34.055 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=172.20.199.64/26 host="ip-YYYY.ec2.internal"
2021-09-08 13:35:34.055 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.ee2d1b49b135905591b7b94a00a205047a101d8708eee6e7fc8428429997e54c'
2021-09-08 13:35:34.100 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=172.20.199.64/26 host="ip-YYYY.ec2.internal"
2021-09-08 13:35:34.100 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.c5b0f2645208ae33ec8ef8e7c592b1747351cf261632a10c26b5e86159c79b00'
2021-09-08 13:35:34.138 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=172.20.199.64/26 host="ip-YYYY.ec2.internal"
2021-09-08 13:35:34.138 [INFO][1] ipam.go 1172: Releasing all IPs with handle 'k8s-pod-network.4f491186d1901e141d8f1c4e94213ff41f2f120333d447a0a032ba08e17d2aac'
2021-09-08 13:35:34.180 [INFO][1] ipam.go 1491: Node doesn't exist, no need to release affinity cidr=172.20.199.64/26 host="ip-YYYY.ec2.internal"
2021-09-08 13:35:34.180 [INFO][1] ipam.go 217: Couldn't release all IPs for stale node, schedule retry calicoNode="ip-XXXX.ec2.internal"
2021-09-08 13:35:34.180 [WARNING][1] ipam.go 180: Error cleaning up node calicoNode="ip-XXXX.ec2.internal" error=Couldn't release all IPs k8sNode=""
2021-09-08 13:35:34.187 [INFO][1] ipam.go 281: Calico Node referenced in IPAM data does not exist error=resource does not exist: Node(ip-ZZZZ.ec2.internal) with error: nodes "ip-ZZZZ.ec2.internal" not found

Eventually, some pods of calico-node start to enter in CrashLoopBack state, to mitigate we perform the drain and terminate on some nodes (not necessary all node with the crashed calico pod), after that almost everything looks fine (except for the calico-kube-controllers logs)

kube-controller version: v3.14.1

zhanggbj commented 3 years ago

Hi @caseydavenport,

We're using 3.11.2 and experiencing a similar issue

  1. Calico tried to Cleaning up IPAM resources for deleted node node and Released all affinities for node, however from the node name, they're exiting nodes
  2. Calico controller keeps retrying to Cleaning up IPAM resources for deleted node node for some real deleted node, based on lot time, it never ends
    2021-09-02 03:02:30.789 [INFO][1] kdd.go 154: Cleaning up IPAM resources for deleted node node=“xxxxx-75lwp”
    2021-09-02 03:2315.328 [INFO][1] kdd.go 154: Cleaning up IPAM resources for deleted node node=“xxxxx-75lwp”
    2021-09-06 00:52:58.821 [INFO][1] kdd.go 154: Cleaning up IPAM resources for deleted node node=“xxxxx-75lwp”
    2021-09-06 00:53:26.225 [INFO][1] kdd.go 154: Cleaning up IPAM resources for deleted node node=“xxxxx-75lwp”
  3. There are unexpected behaviors, like calico keep creating IPAMhandlers for one pod, and creating IPAMhandlers and IPAMBlocks until it gets to the maximum like 30K IPAMhandlers, 500 IPblocks (using /26 blocks now), even when no pod creation event.

I'm thinking this is also due to the IPAM Garbage Collection issue mentioned above, after the node&IPAMBlock got mismatched due to the race, the calico controller keeps retrying to make it right. What do you think?

And you mentioned there will be fixes, and could you give an expected release number? I'm asking as this is an issue happened in our customer's env

caseydavenport commented 3 years ago

@zhanggbj we've made a number of fixes in this area since v3.11 - I'd suggest upgrading to the latest version - v3.20.2 - and see if it solves your issue.

If not, then we can investigate to see if you're encountering something different.

isugimpy commented 3 years ago

@caseydavenport I can confirm that this issue is still occurring on 3.20.2. We've been seeing this for months, several times a week, but have been on 3.20.2 for 12 days on a cluster where this is occurring as I'm typing this message. Would be glad to provide some diagnostics the next time it comes up if there's specific data you need for diagnostics.

alienth commented 3 years ago

Elaborating on @isugimpy 's comment:

Here are the relevant logs where the tunnel address is getting assigned:

2021-10-24 13:43:33.972 [INFO][45] tunnel-ip-allocator/ipam.go 1123: Successfully claimed IPs: [10.102.140.219/26] block=10.102.140.192/26 handle="ipip-tunnel-addr-ip-10-8-170-135.ec2.internal" host="ip-10-8-170-135.ec2.internal"
2021-10-24 13:43:33.972 [INFO][45] tunnel-ip-allocator/ipam.go 777: Auto-assigned 1 out of 1 IPv4s: [10.102.140.219/26] handle="ipip-tunnel-addr-ip-10-8-170-135.ec2.internal" host="ip-10-8-170-135.ec2.internal"
2021-10-24 13:43:33.983 [INFO][45] tunnel-ip-allocator/allocateip.go 455: Assigned tunnel address to node IP="10.102.140.219" type="ipipTunnelAddress"
2021-10-24 13:43:35.144 [INFO][101] tunnel-ip-allocator/allocateip.go 305: Current address is still valid, do nothing currentAddr="10.102.140.219" type="ipipTunnelAddress"

Reviewing birdcl, we confirm that address is getting associated with the tunl interface:

10.102.140.219/32  dev tunl0 [direct1 2021-10-24] * (240)

We also confirm that calico is assigning that same IP to a calico interface, which gets used by a pod:

4: eth0@if479: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8981 qdisc noqueue state UP group default 
    link/ether 62:a6:de:65:b4:5d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.102.140.219/32 brd 10.102.140.219 scope global eth0
       valid_lft forever preferred_lft forever

Something of note: When this problem happens to us, it follows the node. If we delete the afflicted pod, another pod scheduled to the node at some point will encounter the same failure. We just today learned of the tunl interface involvement, so have yet to determine if the problem is following a specific IP or not.

Extra note: Restarting calico on the node seems to resolve the problem, and allows the busted pod to start routing.

caseydavenport commented 3 years ago

The one known race condition that can trigger this scenario is due to deleting and then recreating a node in the Kubernetes API. However, there may be others as well - are you seeing this happen on node scaling events or similar sort of operations?

Extra note: Restarting calico on the node seems to resolve the problem, and allows the busted pod to start routing.

As an aside, this probably triggers calico/node to pick a new IP, which resolves the duplication the issue. That likely means an ordering like this, which is aligned with the known race scenario:

  1. calico/node picks an IP for the tunnel
  2. some time down the road it gets garbage collected (i.e., due to a node deletion event)
  3. new pod is launched, see the IP as available, and uses it.

When you see this happening again, the best diagnostics to grab would be as many as possible of the following:

That should give me a good idea of what state the IPAM data is in, as well as a series of events leading up to it.