kubernetes-sigs / kwok

Kubernetes WithOut Kubelet - Simulates thousands of Nodes and Clusters.
https://kwok.sigs.k8s.io
Apache License 2.0
2.6k stars 207 forks source link

memory leak when running kwok nodes/pods multi hrs/days #1252

Closed sonyafenge closed 1 month ago

sonyafenge commented 1 month ago

How to use it?

What happened?

Run kwok in cluster and monitor memory usage and found memory is keep increasing after multi hrs:

  1. kwok running on node: AWS EC2 c6i.8xlarge (64 GB memory)
  2. simulate 2000 fake nodes and 20k fake pods
  3. monitor memory usage from cloudwatch and top
  4. found memory usage keep increasing
  5. at the beginning of fake nodes/pods running: kwok memory usage is 0.8%, after overnight changed to 11.2%, after 2 days change to 28.2% image image image image

What did you expect to happen?

memory shouldn't keep increasing when fake nodes/pods running

How can we reproduce it (as minimally and precisely as possible)?

repro steps included in "What happened"; yaml for nodes:

cat kwok-node.yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    node.alpha.kubernetes.io/ttl: "0"
    kwok.x-k8s.io/node: fake
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: kwok-node-0
    kubernetes.io/os: linux
    kubernetes.io/role: agent
    node-role.kubernetes.io/agent: ""
    type: kwok
  name: kwok-node-0
spec:
  taints: # Avoid scheduling actual running pods to fake Node
  - effect: NoSchedule
    key: kwok.x-k8s.io/node
    value: fake
status:
  allocatable:
    cpu: 32
    memory: 256Gi
    pods: 110
  capacity:
    cpu: 32
    memory: 256Gi
    pods: 110
  nodeInfo:
    architecture: amd64
    bootID: ""
    containerRuntimeVersion: ""
    kernelVersion: ""
    kubeProxyVersion: fake
    kubeletVersion: fake
    machineID: ""
    operatingSystem: linux
    osImage: ""
    systemUUID: ""
  phase: Running

yaml for pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fake-pod-${DEPI}
  namespace: fake-pod
spec:
  replicas: ${REP}
  selector:
    matchLabels:
      app: fake-pod-${DEPI}
  template:
    metadata:
      labels:
        app: fake-pod-${DEPI}
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: type
                operator: In
                values:
                - kwok
      # A taints was added to an automatically created Node.
      # You can remove taints of Node or add this tolerations.
      tolerations:
      - key: "kwok.x-k8s.io/node"
        operator: "Exists"
        effect: "NoSchedule"
      containers:
      - name: fake-container
        image: fake-image

Anything else we need to know?

No response

Kwok version

```console kwok-controller: Container ID: containerd://ba8bfdfa5cd0181f845a901984a1498a0b701f622540e4e03357e1cd348e03b0 Image: registry.k8s.io/kwok/kwok:v0.6.0 ```

OS version

``` cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.5 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.5 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy ```
wzshiming commented 1 month ago

Good catch, thank you!