kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.23k stars 645 forks source link

unknown phase feature #1417

Open BiancaTofan opened 1 month ago

BiancaTofan commented 1 month ago

Hello ,

I am using the latest version for descheduler chart and I am trying to define a descheduler policy in order to evict pods which are in an unknown state, but it seems that nothing happens even if there are already 2 pods with this status which are running for more than 86400 seconds. Is it there a problem or my policy isn't defined correctly?

apiVersion: v1
data:
  values.yaml: |
    ---
    cmdOptions:
      v: 7
    deschedulerPolicy:
      profiles:
        - name: ProfileName
          pluginConfig:
            - args:
                evictDaemonSetPods: true
                evictLocalStoragePods: true
                evictFailedBarePods: true
              name: "DefaultEvictor"
            - args:
                nodeAffinityType:
                - "requiredDuringSchedulingIgnoredDuringExecution"
              name: "RemovePodsViolatingNodeAffinity"
            - args:
                labelSelector:
                  matchLabels:
                    app.kubernetes.io/name: kured
                maxPodLifeTimeSeconds: 86400
                namespaces:
                  include:
                  - "kubernetes-reboot-daemon"
                states:
                - "Unknown"
                - "Pending"
              name: "PodLifeTime"
          plugins:
            deschedule:
              enabled:
                - "PodLifeTime"
                - "RemovePodsViolatingNodeAffinity"
      strategies: null
    deschedulerPolicyAPIVersion: "descheduler/v1alpha2"
    deschedulingInterval: 5m
    image:
      pullPolicy: Always
    kind: Deployment
    resources:
      limits:
        ephemeral-storage: 100Mi
        memory: 127Mi
      requests:
        cpu: 50m
        ephemeral-storage: 100Mi
        memory: 127Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
      privileged: false
      readOnlyRootFilesystem: true
      runAsGroup: 10001
      runAsNonRoot: true
      runAsUser: 10001

Thank you in advance!

BiancaTofan commented 1 month ago

furthermore, I noticed also that the phases aren't taken into consideration. Even though I define to be evicted only pods in unknown and pending states, the descheduler evicts pods also in running states. Do I need to add something else? I followed the entire documentation and I used it accordingly.

googs1025 commented 1 week ago

hi ! Which k8s version are you using? In addition, how do you achieve the unknown state? Because k8s pod has already abandoned the unknown state.

// These are the valid statuses of pods.
const (
    // PodPending means the pod has been accepted by the system, but one or more of the containers
    // has not been started. This includes time before being bound to a node, as well as time spent
    // pulling images onto the host.
    PodPending PodPhase = "Pending"
    // PodRunning means the pod has been bound to a node and all of the containers have been started.
    // At least one container is still running or is in the process of being restarted.
    PodRunning PodPhase = "Running"
    // PodSucceeded means that all containers in the pod have voluntarily terminated
    // with a container exit code of 0, and the system is not going to restart any of these containers.
    PodSucceeded PodPhase = "Succeeded"
    // PodFailed means that all containers in the pod have terminated, and at least one container has
    // terminated in a failure (exited with a non-zero exit code or was stopped by the system).
    PodFailed PodPhase = "Failed"
    // PodUnknown means that for some reason the state of the pod could not be obtained, typically due
    // to an error in communicating with the host of the pod.
    // Deprecated: It isn't being set since 2015 (74da3b14b0c0f658b3bb8d2def5094686d0e9095)
    PodUnknown PodPhase = "Unknown"
)

refer to: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L3089

Please forgive me if my understanding is wrong.

BiancaTofan commented 1 week ago

Hello , I use 1.28 for the kubernetes version. Unknown phase was added recently in descheduler code. I am achieving this state due to kured (kubernetes reboot daemon) which does a node reboot.