kube-controllers should clean up IPAM handles even if no allocation exists

bradbehle commented 9 months ago

Expected Behavior

calico-kube-controller should clean up leaked pod IPs / IPAMHandles

Current Behavior

In at least this cluster we are looking at, calico-kube-controller is not cleaning these up

Possible Solution

Figure out why calico-kube-controller doesn't seem to notice these 30,000 leaked IPAM handles, but calicoctl ipam check does (see details below

Steps to Reproduce (for bugs)

Create an Openshift 4.6 cluster with Calico v3.17.2 (deployed via Operator v1.13.4) about 2.5 years ago
Use that cluster for those 2.5 years, upgrading Openshift and Calico
Get to today, running Openshift 4.12 and Calico 3.26.4, and notice 30,000+ leaked IPAM Handles
Wonder why calico-kube-controller isn't cleaning them up

Context

We don't think this is causing any noticeable problems with the cluster at the moment, but the etcd performance can't be helped by all these leaked CRDs. We could probably run calicoctl ipam release and clean these up. But the concern is that this is probably also a problem on a bunch of old clusters we maintain, and eventually could become a problem, so we were hoping if we provide information about this cluster, someone could determine why calico-kube-controllers isn't cleaning these up, and fix it in an upcoming Calico release.

Your Environment

Calico version: Started at 3.17.2 (Operator v1.13.4), has been upgraded many times, now is at Calico 3.26.4
Orchestrator version (e.g. kubernetes, mesos, rkt): Started at Openshift 4.6, now is at Openshift 4.12
Operating System and version: Uses RHEL 8.9 worker nodes
Link to your project (optional):

Here's the information that shows all the leaked IPAM handles (with IP addresses obscured):

+----------+----------------+-----------+------------+-------------+
| GROUPING |       CIDR     | IPS TOTAL | IPS IN USE |  IPS FREE   |
+----------+----------------+-----------+------------+-------------+
| IP Pool  | X.X.0.0/16     |     65536 | 537 (1%)   | 64999 (99%) |
| Block    | X.X.128.128/26 |        64 | 26 (41%)   | 38 (59%)    |
| Block    | X.X.128.64/26  |        64 | 1 (2%)     | 63 (98%)    |
| Block    | X.X.162.192/26 |        64 | 63 (98%)   | 1 (2%)      |
| Block    | X.X.163.0/26   |        64 | 25 (39%)   | 39 (61%)    |
| Block    | X.X.18.192/26  |        64 | 62 (97%)   | 2 (3%)      |
| Block    | X.X.19.0/26    |        64 | 63 (98%)   | 1 (2%)      |
| Block    | X.X.19.64/26   |        64 | 41 (64%)   | 23 (36%)    |
| Block    | X.X.228.192/26 |        64 | 64 (100%)  | 0 (0%)      |
| Block    | X.X.229.0/26   |        64 | 63 (98%)   | 1 (2%)      |
| Block    | X.X.229.64/26  |        64 | 33 (52%)   | 31 (48%)    |
| Block    | X.X.83.0/26    |        64 | 8 (12%)    | 56 (88%)    |
| Block    | X.X.91.0/26    |        64 | 64 (100%)  | 0 (0%)      |
| Block    | X.X.91.64/26   |        64 | 24 (38%)   | 40 (62%)    |
+----------+----------------+-----------+------------+-------------+

Checking IPAM for inconsistencies...

Loading all IPAM blocks...
Found 13 IPAM blocks.
 IPAM block X.X.128.128/26 affinity=host:X.X.X.139:
 IPAM block X.X.128.64/26 affinity=host:X.X.X.139:
 IPAM block X.X.162.192/26 affinity=host:X.X.X.160:
 IPAM block X.X.163.0/26 affinity=host:X.X.X.160:
 IPAM block X.X.18.192/26 affinity=host:X.X.X.174:
 IPAM block X.X.19.0/26 affinity=host:X.X.X.174:
 IPAM block X.X.19.64/26 affinity=host:X.X.X.174:
 IPAM block X.X.228.192/26 affinity=host:X.X.X.178:
 IPAM block X.X.229.0/26 affinity=host:X.X.X.178:
 IPAM block X.X.229.64/26 affinity=host:X.X.X.178:
 IPAM block X.X.83.0/26 affinity=host:X.X.X.146:
 IPAM block X.X.91.0/26 affinity=host:X.X.X.169:
 IPAM block X.X.91.64/26 affinity=host:X.X.X.169:
IPAM blocks record 537 allocations.

Loading all IPAM pools...
  X.X.0.0/16
Found 1 active IP pools.

Loading all nodes.
Found 6 node tunnel IPs.

Loading all workload endpoints.
Found 531 workload IPs.
Workloads and nodes are using 537 IPs.

Loading all handles
Looking for top (up to 20) nodes by allocations...
  X.X.X.174 has 166 allocations
  X.X.X.178 has 160 allocations
  X.X.X.169 has 88 allocations
  X.X.X.160 has 88 allocations
  X.X.X.139 has 27 allocations
  X.X.X.146 has 8 allocations
Node with most allocations has 166; median is 88

Scanning for IPs that are allocated but not actually in use...
Found 0 IPs that are allocated in IPAM but not actually in use.
Scanning for IPs that are in use by a workload or node but not allocated in IPAM...
Found 0 in-use IPs that are not in active IP pools.
Found 0 in-use IPs that are in active IP pools but have no corresponding IPAM allocation.

Scanning for IPAM handles with no matching IPs...
Found 30555 handles with no matching IPs (and 537 handles with matches).
Scanning for IPs with missing handle...
Found 0 handles mentioned in blocks with no matching handle resource.
Check complete; found 30555 problems.

Here's the object counts in etcd to confirm the large number:

>> ETCD TOP 5 OBJECT RESOURCE COUNTS:
apiserver_storage_objects{resource="ipamhandles.crd.projectcalico.org"} 31092
apiserver_storage_objects{resource="secrets"} 3230
apiserver_storage_objects{resource="replicasets.apps"} 2582

I've also attached the output of get ipamhandles.crd.projectcalico.org -o wide and get ipamhandles.crd.projectcalico.org -o custom-columns=.:.spec, which shows that almost all the CRDs are over 2 years old. I also included the calico-kube-controllers pod log that shows not much cleanup happening. Please let me know if you would like more information about this cluster.

bradbehle commented 9 months ago

kubectl get ipamhandle output:

get-ipamhandle-output.txt

calico-kube-controllers pod log:

calico-kube-controllers-964d68fc-x982n.txt

caseydavenport commented 9 months ago

Thanks @bradbehle - from what I can see in those files, the vast majority of the IPAM handles are from over 2 years ago. I suspect what has happened is that those IPAM handles were leaked before we had implemented many of the garbage collection improvements in more recent releases.

The kube-controllers code itself doesn't collect stray handles if the IP addresses associated with them don't exist, which is exactly the state this cluster is in. This is obviously a limitation, but in most normal operation the handle and IP are released as close to atomically as the k8s API allows (and the handle is deleted first before the allocation) so you wouldn't see this state.

If you can confirm that the leaked handles are all old, and that there are no new leaks occurring, then I think it would be safe to clean the old handles up using calicoctl and chalk this up to older versions of Calico leaving cruft behind.

caseydavenport commented 9 months ago

There is probably a pretty good case to be made for kube-controllers cleaning up handles with no allocation the same way that it cleans up allocations themselves.

dzacball commented 9 months ago

Thanks for looking into this, that explains it - we can confirm that all leaked handles are old.

rptaylor commented 3 months ago

I think here are the details about the Calico fix for this: https://github.com/projectcalico/calico/issues/6988#issuecomment-1331016577

With calico v3.24.5 I did the cleanup procedure and it removed the stale handles with no matching IPs, so probably this can be closed.

caseydavenport commented 3 months ago

@rptaylor that fix improved the calicoctl IPAM cleanup code to release the handles, but this issue is more about automated GC that doesn't require intervention via calicoctl.

projectcalico / calico