karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.37k stars 865 forks source link

Eager for memory experts to help optimize karmada-controller-manager memory usage #4738

Open chaosi-zju opened 5 months ago

chaosi-zju commented 5 months ago

What happened:

karmada-controller-manager has a high memory usage when it handles many resources.

scenario:

E.G1: 400 deployment + 400 secret + 800 configmap + 400 service + 400 serviceexport.

the RSS memory count to 360MB, besides, once restart karmada-controller-manager, the final RSS memory count to 520MB

E.G2: 2000 deployment + 2000 secret + 4000 configmap + 2000 service + 2000 serviceexport.

the RSS memory count to more than 1GB, besides, once restart karmada-controller-manager, the final RSS memory count to 2GB~3GB.

What problem do you want for hlep:

  1. most important, is it reasonable to have such a high memory usage? Is there any place to optimize space? and how to find a place to optimize space?
  2. take E.G1 as example, we apply such resource one by one, the memory only cost 360MB, why we restart controller pod, the final memory count to 520MB? Is there any spacefor optimization here?
  3. take E.G1 as example, we observe RSS memory by crictl stats or top, the final memory is around 520MB, but if we observe inuse_space by pprof, the inuse_space only shows 180M, where does the rest 340MB lies?

How to reproduce it (as minimally and precisely as possible):

  1. install karmada by hack/local-up-karmada.sh
  2. for better observation, scale the number of karmada-controller-manager to 1: kubectl --context karmada-host -n karmada-system scale deploy karmada-controller-manager --replicas=1
  3. copy following cpp.yamldemo-test.yamltest.sh in appendix into local file.
  4. execute test.sh.

Then, the test.sh will create 100 namespace, each namespace has 4 application, each application has 1 deployment/4 pod/1 secret/2 configmap/1 service/1 serviceexport.

Finally, you can observe the RSS memory of karmada-controller-manager by crictl stats (or top) command, just like:

docker exec -it karmada-host-control-plane bash
crictl stats -w `crictl ps | grep karmada-controller-manager | awk '{print $1}'`

and you can observe the heap memory collected by pprof, detail see: Profiling Karmada.

appendix:

cpp.yaml ```yaml apiVersion: policy.karmada.io/v1alpha1 kind: ClusterPropagationPolicy metadata: name: default-cpp spec: propagateDeps: true placement: replicaScheduling: replicaDivisionPreference: Weighted replicaSchedulingType: Divided weightPreference: staticWeightList: - targetCluster: clusterNames: - member1 weight: 1 - targetCluster: clusterNames: - member2 weight: 1 resourceSelectors: - apiVersion: apps/v1 kind: Deployment - apiVersion: v1 kind: Service - apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition name: serviceexports.multicluster.x-k8s.io - apiVersion: multicluster.x-k8s.io/v1alpha1 kind: ServiceExport - apiVersion: v1 kind: ConfigMap - apiVersion: v1 kind: Secret ```
demo-test.yaml ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: demo-test namespace: demo-test-ns spec: replicas: 4 selector: matchLabels: app: demo-test template: metadata: labels: app: demo-test spec: terminationGracePeriodSeconds: 0 containers: - image: nginx name: demo-test resources: limits: cpu: 10m memory: 10Mi volumes: - name: secret secret: secretName: demo-test - name: configmap configMap: name: demo-test - name: configmap-2 configMap: name: demo-test-2 --- apiVersion: v1 kind: Secret metadata: name: demo-test namespace: demo-test-ns type: Opaque data: key: MTIzCg== --- apiVersion: v1 kind: ConfigMap metadata: name: demo-test namespace: demo-test-ns data: test-key: "test-value" --- apiVersion: v1 kind: ConfigMap metadata: name: demo-test-2 namespace: demo-test-ns data: test-key: "test-value" --- apiVersion: v1 kind: Service metadata: name: demo-test namespace: demo-test-ns spec: ports: - port: 80 targetPort: 8080 selector: app: demo-test --- apiVersion: multicluster.x-k8s.io/v1alpha1 kind: ServiceExport metadata: name: demo-test namespace: demo-test-ns ```
test.sh ```shell #!/bin/bash export KUBECONFIG=~/.kube/karmada.config:~/.kube/members.config alias ka='kubectl --context karmada-apiserver' shopt -s expand_aliases # ${ns} namespaces ns=100 # each namespace has ${dnum} deployment dnum=4 ka apply -f cpp.yaml # iterate each namespace for i in $(seq 1 $ns) do nsName="ns-${i}" # create namespace if it not exist ka create ns ${nsName} --dry-run=client -o yaml | ka apply -f - # create each resource in certain namespace for j in $(seq 1 $dnum) do cp -f demo-test.yaml demo-test-tmp.yaml sed -i'' -e "s/demo-test-ns/${nsName}/g" demo-test-tmp.yaml sed -i'' -e "s/demo-test-2/test-${i}-${j}-2/g" demo-test-tmp.yaml sed -i'' -e "s/demo-test/test-${i}-${j}/g" demo-test-tmp.yaml ka apply -f demo-test-tmp.yaml done done ```

Anything else we need to know?:

  1. memory is just high, growing linearly with the number of resources. Memory does not gradually rise when the number of resources remains the same, so it does not look like a memory leak.
  2. there are some pprof:

Environment:

chaosi-zju commented 5 months ago

/help

karmada-bot commented 5 months ago

@chaosi-zju: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/karmada-io/karmada/issues/4738): >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
wm775825 commented 5 months ago

It seems that inuse_space of eg1 and alloc_space of eg1 are the same picture

wm775825 commented 5 months ago

It seems CacheReader.Get and CacheReader.List double memory usage. For CacheReader.List and CacheReader.Get, use client.UnsafeDisableDeepCopy option explicitly to aviod deepcopy if you won't modify returned objects.

chaosi-zju commented 5 months ago

It seems that inuse_space of eg1 and alloc_space of eg1 are the same picture

sorry, the link is mixed up, I will update it.

---update---

done

chaosi-zju commented 5 months ago

use client.UnsafeDisableDeepCopy option explicitly to aviod deepcopy if you won't modify returned objects.

According to your suggestion, I did following modification:

https://github.com/karmada-io/karmada/blob/dca5c1abd7669539e8cc62d6c5400943d14246da/cmd/controller-manager/app/controllermanager.go#L165-L178

I updated above NewCache function to:

NewCache: func(config *rest.Config, opts cache.Options) (cache.Cache, error) {
    opts.DefaultTransform = fedinformer.StripUnusedFields
    disable := true
    opts.DefaultUnsafeDisableDeepCopy = &disable
    return cache.New(config, opts)
},

As for case: 400 deployment + 400 secret + 800 configmap + 400 service + 400 serviceexport

before set DefaultUnsafeDisableDeepCopy: 360MB RSS, once restart becomes 520MB. after set DefaultUnsafeDisableDeepCopy: 220MB RSS, once restart becomes 250MB.

chaosi-zju commented 5 months ago

before set DefaultUnsafeDisableDeepCopy: 360MB RSS, once restart becomes 520MB. after set DefaultUnsafeDisableDeepCopy: 220MB RSS, once restart becomes 250MB.

@RainbowMango can we introduce this modification?

wm775825 commented 5 months ago

If use global configuration, we should check if there is case where returned objects by List or Get are modified in place.