projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.89k stars 1.31k forks source link

calico-kube-controller list/watch pod frequent OOM #8155

Open gongzixiangyuan opened 11 months ago

gongzixiangyuan commented 11 months ago

Expected Behavior

calico-kube-controller memory will not increase significantly with the number of pods

Current Behavior

  1. calico-kube-controller list/watch pod, memory limit 400M, when the cluster pod size exceeds 5000, frequent OOM
  2. GOGC is set to 50%

Possible Solution

  1. Is it possible not to cache the entire pod information?
  2. Can you cache only the fields actually used by NodeController such as pod name, IP, etc.
  3. Can you query the pod after the node is deleted?

Steps to Reproduce (for bugs)

  1. calico-kube-controller memory limit 400M
  2. The number of pods increases from 1000 to 5000(100 nodes)

Context

Your Environment

Calico version 3.25

gongzixiangyuan commented 10 months ago

@coutinhop Is there any room for optimization in this scenario? Looking forward to your reply, thank you

fasaxc commented 10 months ago

@gongzixiangyuan Please upgrade to the latest Calico and see if this reproduces; I think we've fixed a memory leak or two. It's not easy to make kube-controllers scan the datastore without loading it all; the kubernetes client library makes that quite difficult. However, it's possible that you've discovered a memory leak bug for example.

300M does sound like a lot of memory for just 5k pods, possible that your datastore has leaked IPAM blocks (and the OOMs are killing kube-controllers, which prevents it from cleaning them up, so the problem gets worse). You may be able to clean some up with calicoctl ipam check/release https://docs.tigera.io/calico/latest/reference/calicoctl/ipam/check#examples

gongzixiangyuan commented 10 months ago

@fasaxc thank you for your reply

I checked, there should be no leaks in ipamhandle, because my environment is still creating/deleting pods frequently, so I will see one or two problems

https://github.com/projectcalico/calico/pull/7433 ( This memory leak issue has also been updated to the environment )

Below is the result of my check:

[root@master1 ]# calicoctl ipam check Checking IPAM for inconsistencies...

Loading all IPAM blocks... Found 107 IPAM blocks. IPAM block 172.18.11.192/26 affinity=host:work15: IPAM block 172.18.115.128/26 affinity=host:work13: IPAM block 172.18.115.192/26 affinity=host:work13: IPAM block 172.18.115.64/26 affinity=host:work13: IPAM block 172.18.116.0/26 affinity=host:work13: IPAM block 172.18.116.64/26 affinity=host:work13: IPAM block 172.18.12.0/26 affinity=host:work15: IPAM block 172.18.12.128/26 affinity=host:work15: IPAM block 172.18.12.192/26 affinity=host:work20: IPAM block 172.18.12.64/26 affinity=host:work15: IPAM block 172.18.122.128/26 affinity=host:work12: IPAM block 172.18.122.192/26 affinity=host:work12: IPAM block 172.18.123.0/26 affinity=host:work2: IPAM block 172.18.123.128/26 affinity=host:work12: IPAM block 172.18.123.192/26 affinity=host:work2: IPAM block 172.18.123.64/26 affinity=host:work2: IPAM block 172.18.124.0/26 affinity=host:work12: IPAM block 172.18.124.64/26 affinity=host:work12: IPAM block 172.18.126.0/26 affinity=host:work8: IPAM block 172.18.126.128/26 affinity=host:work8: IPAM block 172.18.126.192/26 affinity=host:work8: IPAM block 172.18.126.64/26 affinity=host:work8: IPAM block 172.18.127.0/26 affinity=host:work8: IPAM block 172.18.13.0/26 affinity=host:work20: IPAM block 172.18.13.128/26 affinity=host:work20: IPAM block 172.18.13.192/26 affinity=host:work20: IPAM block 172.18.13.64/26 affinity=host:work20: IPAM block 172.18.130.128/26 affinity=host:work5: IPAM block 172.18.130.192/26 affinity=host:work5: IPAM block 172.18.131.0/26 affinity=host:work5: IPAM block 172.18.131.128/26 affinity=host:work5: IPAM block 172.18.131.64/26 affinity=host:work5: IPAM block 172.18.132.192/26 affinity=host:work10: IPAM block 172.18.133.0/26 affinity=host:work10: IPAM block 172.18.133.64/26 affinity=host:work10: IPAM block 172.18.136.0/26 affinity=host:master3: IPAM block 172.18.136.128/26 affinity=host:master3: IPAM block 172.18.136.64/26 affinity=host:master3: IPAM block 172.18.137.128/26 affinity=host:master1: IPAM block 172.18.137.192/26 affinity=host:master1: IPAM block 172.18.137.64/26 affinity=host:master1: IPAM block 172.18.14.0/26 affinity=host:work15: IPAM block 172.18.140.192/26 affinity=host:work18: IPAM block 172.18.141.0/26 affinity=host:work18: IPAM block 172.18.141.128/26 affinity=host:work18: IPAM block 172.18.141.192/26 affinity=host:work18: IPAM block 172.18.141.64/26 affinity=host:work18: IPAM block 172.18.180.0/26 affinity=host:master2: IPAM block 172.18.180.128/26 affinity=host:master2: IPAM block 172.18.180.64/26 affinity=host:master2: IPAM block 172.18.198.128/26 affinity=host:work17: IPAM block 172.18.198.192/26 affinity=host:work17: IPAM block 172.18.198.64/26 affinity=host:work17: IPAM block 172.18.199.0/26 affinity=host:work17: IPAM block 172.18.199.64/26 affinity=host:work17: IPAM block 172.18.204.0/26 affinity=host:work11: IPAM block 172.18.204.128/26 affinity=host:work21: IPAM block 172.18.204.192/26 affinity=host:work11: IPAM block 172.18.204.64/26 affinity=host:work21: IPAM block 172.18.205.0/26 affinity=host:work21: IPAM block 172.18.205.128/26 affinity=host:work11: IPAM block 172.18.205.192/26 affinity=host:work11: IPAM block 172.18.205.64/26 affinity=host:work11: IPAM block 172.18.215.0/26 affinity=host:work1: IPAM block 172.18.215.128/26 affinity=host:work1: IPAM block 172.18.215.64/26 affinity=host:work1: IPAM block 172.18.228.192/26 affinity=host:work22: IPAM block 172.18.229.0/26 affinity=host:work22: IPAM block 172.18.229.128/26 affinity=host:work22: IPAM block 172.18.229.192/26 affinity=host:work22: IPAM block 172.18.229.64/26 affinity=host:work22: IPAM block 172.18.243.128/26 affinity=host:work6: IPAM block 172.18.243.192/26 affinity=host:work6: IPAM block 172.18.244.0/26 affinity=host:work6: IPAM block 172.18.244.128/26 affinity=host:work6: IPAM block 172.18.244.64/26 affinity=host:work6: IPAM block 172.18.252.192/26 affinity=host:work4: IPAM block 172.18.253.0/26 affinity=host:work19: IPAM block 172.18.253.128/26 affinity=host:work19: IPAM block 172.18.253.192/26 affinity=host:work4: IPAM block 172.18.253.64/26 affinity=host:work4: IPAM block 172.18.254.0/26 affinity=host:work19: IPAM block 172.18.254.128/26 affinity=host:work4: IPAM block 172.18.254.64/26 affinity=host:work4: IPAM block 172.18.27.0/26 affinity=host:work7: IPAM block 172.18.27.128/26 affinity=host:work7: IPAM block 172.18.27.192/26 affinity=host:work7: IPAM block 172.18.27.64/26 affinity=host:work7: IPAM block 172.18.28.0/26 affinity=host:work7: IPAM block 172.18.33.192/26 affinity=host:work3: IPAM block 172.18.34.0/26 affinity=host:work3: IPAM block 172.18.34.64/26 affinity=host:work3: IPAM block 172.18.79.192/26 affinity=host:work14: IPAM block 172.18.80.0/26 affinity=host:work16: IPAM block 172.18.80.128/26 affinity=host:work16: IPAM block 172.18.80.192/26 affinity=host:work14: IPAM block 172.18.80.64/26 affinity=host:work14: IPAM block 172.18.81.0/26 affinity=host:work16: IPAM block 172.18.81.128/26 affinity=host:work16: IPAM block 172.18.81.192/26 affinity=host:work14: IPAM block 172.18.81.64/26 affinity=host:work16: IPAM block 172.18.82.0/26 affinity=host:work14: IPAM block 172.18.87.0/26 affinity=host:work9: IPAM block 172.18.87.128/26 affinity=host:work9: IPAM block 172.18.87.192/26 affinity=host:work9: IPAM block 172.18.87.64/26 affinity=host:work9: IPAM block 172.18.88.0/26 affinity=host:work9: IPAM blocks record 5886 allocations.

Loading all IPAM pools... 172.18.0.0/16 Found 1 active IP pools.

Loading all nodes. Found 0 node tunnel IPs.

Loading all workload endpoints. Found 5885 workload IPs. Workloads and nodes are using 5885 IPs.

Looking for top (up to 20) nodes by allocations... work9 has 297 allocations work15 has 296 allocations work22 has 292 allocations work12 has 292 allocations work18 has 292 allocations work8 has 291 allocations work20 has 291 allocations work13 has 291 allocations work17 has 291 allocations work4 has 290 allocations work11 has 290 allocations work16 has 289 allocations work7 has 288 allocations work5 has 287 allocations work14 has 270 allocations work21 has 164 allocations work10 has 160 allocations work19 has 159 allocations work6 has 158 allocations master3 has 153 allocations Node with most allocations has 297; median is 288

Scanning for IPs that are allocated but not actually in use... Found 1 IPs that are allocated in IPAM but not actually in use. Scanning for IPs that are in use by a workload or node but not allocated in IPAM... Found 0 in-use IPs that are not in active IP pools. Found 0 in-use IPs that are in active IP pools but have no corresponding IPAM allocation.

Check complete; found 1 problems.

fasaxc commented 9 months ago

@gongzixiangyuan please try the latest version; I think there have been some fixes. We also added a debug server which will let you collect a memory profile so we can see what's going on.

gongzixiangyuan commented 9 months ago

@gongzixiangyuan please try the latest version; I think there have been some fixes. We also added a debug server which will let you collect a memory profile so we can see what's going on.

Thank you so much! I will try again when the performance environment is OK.

gongzixiangyuan commented 9 months ago

@fasaxc I collected the profiles, and I suspect that IPAM sync is triggered too frequently( More than 40 times in 1 minute), causing too many pod objects to be converted to WorkloadEndpoint, and the GC is too busy.

2023-12-01 20:43:59.617 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:00.015 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:00.060 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:00.570 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:00.823 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:01.040 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:01.173 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:01.376 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:01.707 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:01.842 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:01.909 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:02.025 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:02.309 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:02.507 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:02.709 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:02.907 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:03.019 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:03.195 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:03.442 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:03.532 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:03.682 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:03.893 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:04.009 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:04.299 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:04.479 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:04.521 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:04.813 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:04.923 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:07.808 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:08.208 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:08.408 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:08.608 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:08.808 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:08.910 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:09.011 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:09.209 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:09.407 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:09.511 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:09.710 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:10.224 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:10.337 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:10.608 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:10.788 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:24.796 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:44:43.197 [INFO][11] ipam.go 275: Triggered IPAM sync
2023-12-01 20:45:01.160 [INFO][11] ipam.go 275: Triggered IPAM sync
WorkloadEndpointConverter.PodToWorkloadEndpoints in github.com/projectcalico/calico/libcalico-go/lib/backend/k8s/conversion/workload_endpoint.go
  *ipamController.allocationIsValid in github.com/projectcalico/calico/kube-controllers/pkg/controllers/node/ipam.go
    *ipamController.checkAllocations in github.com/projectcalico/calico/kube-controllers/pkg/controllers/node/ipam.go
      *ipamController.syncIPAM in github.com/projectcalico/calico/kube-controllers/pkg/controllers/node/ipam.go
        *ipamController.acceptScheduleRequests in github.com/projectcalico/calico/kube-controllers/pkg/controllers/node/ipam.go
          *ipamController.Start in github.com/projectcalico/calico/kube-controllers/pkg/controllers/node/ipam.go
gongzixiangyuan commented 9 months ago

https://github.com/cert-manager/cert-manager/blob/master/design/20221205-memory-management.md#transform-functions

transform functions Transforming the object before it gets placed into cache. Client-go allows configuring core informers with transform functions. These functions will get called with the object as an argument before the object is placed into cache. The transformer will need to convert the object to a concrete or metadata type if it wants to retrieve its fields. This is a lesser used functionality in comparison with metadata only caching. A couple usage examples:

support for transform functions was added in controller-runtime https://github.com/kubernetes-sigs/controller-runtime/pull/1805 with the goal of allowing users to remove managed fields and annotations Istio's pilot controller uses this mechanism to configure their client-go informers to remove managed fields before putting object into cache I haven't seen any usage examples where non-metadata fields are modified using this mechanism. I cannot see a reason why new fields (i.e a label that signals that a transform was applied could not be added) as well as fields being removed.

@fasaxc Maybe you can take a look at this, this should avoid caching the entire pod

gongzixiangyuan commented 9 months ago

@fasaxc I see an optimization has been made here, which only caches the fields that need to be used in the pod. Can calico-kube-controller do similar enhancements?

https://github.com/cloudnativelabs/kube-router/pull/999/files

gongzixiangyuan commented 9 months ago

@fasaxc I see an optimization has been made here, which only caches the fields that need to be used in the pod. Can calico-kube-controller do similar enhancements?

https://github.com/cloudnativelabs/kube-router/pull/999/files

This doesn’t seem to be suitable for calico-kube-controller。 There is a simpler way to modify:

// Before podInformer Run
podInformer.SetTransform(func(i interface{}) (interface{}, error) {
        // TODO: Only cache the pod attributes needed by calico-kube-controller
});

I will modify it and see the effect

gongzixiangyuan commented 9 months ago
        k8sconfig, err = clientcmd.BuildConfigFromFlags("", kubeconfig)
        // use protobuf
        k8sconfig = metadata.ConfigFor(k8sconfig)

In addition, using protobuf can save some memory compared to json.