kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.48k stars 669 forks source link

No removable pods on node,why? #1516

Closed hwyrq closed 1 month ago

hwyrq commented 1 month ago

I0912 09:17:02.750207 1 nodeutilization.go:204] "Node is underutilized" node="home0123" usage={"cpu":"600m","memory":"306Mi","pods":"4"} usagePercentage={"cpu":0.75,"memory":0.33,"pods":3.64}
│ I0912 09:17:02.750371 1 nodeutilization.go:207] "Node is overutilized" node="cloud0406" usage={"cpu":"950m","memory":"290Mi","pods":"15"} usagePercentage={"cpu":47.5,"memory":17.32,"pods":13.64}
│ I0912 09:17:02.750414 1 nodeutilization.go:204] "Node is underutilized" node="home0122" usage={"cpu":"200m","memory":"250Mi","pods":"10"} usagePercentage={"cpu":3.33,"memory":1.62,"pods":9.09}
│ I0912 09:17:02.750448 1 lownodeutilization.go:135] "Criteria for a node under utilization" CPU=100 Mem=5 Pods=100
│ I0912 09:17:02.750483 1 lownodeutilization.go:136] "Number of underutilized nodes" totalNumber=2
│ I0912 09:17:02.750506 1 lownodeutilization.go:149] "Criteria for a node above target utilization" CPU=100 Mem=10 Pods=100
│ I0912 09:17:02.750539 1 lownodeutilization.go:150] "Number of overutilized nodes" totalNumber=1
│ I0912 09:17:02.750567 1 nodeutilization.go:261] "Total capacity to be moved" CPU=85200 Mem=10678022143 Pods=206
│ I0912 09:17:02.750655 1 nodeutilization.go:264] "Evicting pods from node" node="cloud0406" usage={"cpu":"950m","memory":"290Mi","pods":"15"}
│ I0912 09:17:02.750978 1 nodeutilization.go:267] "Pods on node" node="cloud0406" allPods=15 nonRemovablePods=15 removablePods=0
│ I0912 09:17:02.751021 1 nodeutilization.go:270] "No removable pods on node, try next node" node="cloud0406"
│ I0912 09:17:02.751082 1 profile.go:349] "Total number of pods evicted" extension point="Balance" evictedPods=0

hwyrq commented 1 month ago

I learned the go , looked at the source code, and Add a annotations here

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: wccloud-web-rust
  name: wccloud-web-rust
spec:
  replicas: 10
  selector:
    matchLabels:
      app: wccloud-web-rust
  template:
    metadata:
      annotations:
        "descheduler.alpha.kubernetes.io/evict": "1"
      labels:
        app: wccloud-web-rust
    spec:
      containers:
        - name: wccloud-web-rust

Now there is a new problem,After the pod was expelled, it was deployed again on the original node

hwyrq commented 1 month ago

│ 0913 03:02:02.597709 1 nodeutilization.go:204] "Node is underutilized" node="home0123" usage={"cpu":"600m","memory":"306Mi","pods":"9"} usagePercentage={"cpu":0.75,"memory":0.33,"pods":8.18}
│ 0913 03:02:02.597743 1 lownodeutilization.go:135] "Criteria for a node under utilization" CPU=100 Mem=2 Pods=100
│ 0913 03:02:02.597779 1 lownodeutilization.go:136] "Number of underutilized nodes" totalNumber=2
│ 0913 03:02:02.597803 1 lownodeutilization.go:149] "Criteria for a node above target utilization" CPU=100 Mem=5 Pods=100
│ 0913 03:02:02.597834 1 lownodeutilization.go:150] "Number of overutilized nodes" totalNumber=1
│ 0913 03:02:02.597863 1 nodeutilization.go:261] "Total capacity to be moved" CPU=85200 Mem=5120122060 Pods=202
│ 0913 03:02:02.597902 1 nodeutilization.go:264] "Evicting pods from node" node="cloud0406" usage={"cpu":"950m","memory":"290Mi","pods":"15"}
│ 0913 03:02:02.598126 1 nodeutilization.go:267] "Pods on node" node="cloud0406" allPods=15 nonRemovablePods=13 removablePods=2
│ 0913 03:02:02.598167 1 nodeutilization.go:274] "Evicting pods based on priority, if they have same priority, they'll be evicted based on QoS tiers"
│ 0913 03:02:02.688816 1 evictions.go:170] "Evicted pod" pod="default/wccloud-web-rust-68ddb477d9-267ss" reason="" strategy="LowNodeUtilization" node="cloud0406" profile="ProfileName"
│ 0913 03:02:02.688910 1 nodeutilization.go:316] "Evicted pods" pod="default/wccloud-web-rust-68ddb477d9-267ss"
│ 0913 03:02:02.688937 1 nodeutilization.go:341] "Updated node usage" node="cloud0406" CPU=950 Mem=304087040 Pods=14
│ 0913 03:02:02.857815 1 evictions.go:170] "Evicted pod" pod="default/wccloud-web-rust-68ddb477d9-dx9hw" reason="" strategy="LowNodeUtilization" node="cloud0406" profile="ProfileName"
│ 0913 03:02:02.857912 1 nodeutilization.go:316] "Evicted pods" pod="default/wccloud-web-rust-68ddb477d9-dx9hw"
│ 0913 03:02:02.857943 1 nodeutilization.go:341] "Updated node usage" node="cloud0406" CPU=950 Mem=304087040 Pods=13
│ 0913 03:02:02.858015 1 profile.go:349] "Total number of pods evicted" extension point="Balance" evictedPods=2
│ 0913 03:02:02.858171 1 descheduler.go:170] "Number of evicted pods" totalEvicted=2
│ 0913 03:02:02.858486 1 reflector.go:302] Stopping reflector v1.Pod (0s) from k8s.io/client-go/informers/factory.go:160
│ 0913 03:02:02.858777 1 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
│ 0913 03:02:02.858831 1 event_broadcaster.go:279] "Unable to write event (may retry after sleeping)" err="Post \"https://10.96.0.1:443/apis/events.k8s.io/v1/namespaces/default/events\": context canceled"
│ 0913 03:02:02.858942 1 secure_serving.go:258] Stopped listening on :10258
│ 0913 03:02:02.859024 1 reflector.go:302] Stopping reflector
v1.Node (0s) from k8s.io/client-go/informers/factory.go:160
│ 0913 03:02:02.859079 1 reflector.go:302] Stopping reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:160

damemi commented 1 month ago

@hwyrq glad you figured it out, for future reference the docs lists the reasons why a pod isn't evictable and you can set --v=4 to log the specific reason why a pod wasn't evicted