kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.49k stars 669 forks source link

There is no plugin for podAffinity violations #1462

Open tuxillo opened 4 months ago

tuxillo commented 4 months ago

Is your feature request related to a problem? Please describe.

I think it is a problem because, as far as I know, there is no way of rescheduling pods that are in violation of podAffinity.

Describe the solution you'd like

Make a plugin available that deals with this, right? 🥲

Describe alternatives you've considered Right now I'm manually doing rollout in deployments so that the scheduling is right, for example after ugprades.

What version of descheduler are you using?

descheduler version: latest Additional context

Issue #1286 was closed as "Not planned" but feedback was provided and I think it's a feature that's pretty much needed.

tuxillo commented 3 months ago

Any news?

damemi commented 3 months ago

Hi @tuxillo, sorry but unfortunately we don't have much bandwidth for adding new plugins at this time. The descheduling framework refactor is meant to make it easier for anyone to develop their own plugins. The existing plugin code should serve as a guide, but we hope to publish a more generic developer guide in the future.

tuxillo commented 3 months ago

@damemi thanks for the update. No worries! I was wondering if this plugin was missing for any particular reason. I might give it a try at implementing it but, should I wait for the refactor you're mentioning?

damemi commented 3 months ago

@tuxillo please feel free to implement it and open a PR if you'd like! the refactor is done already. Ideally, the goal of this was to make it so you don't even need to contribute the plugin code here if you don't want to and could just build your own descheduler with your plugin imported.

tuxillo commented 1 month ago

I did an initial version using RemovePodsViolatingInterPodAntiAffinity as the base, but as it turns out I ended up just reversing the logic of it, which is most certainly the wrong approach. And also sounds like not worth of a whole plugin.

Let me explain my use case:

I set application A to run in certain nodes. I want application B to always run in nodes where application A is, so for application B I use:

    affinity:
      podAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
                - key: app.kubernetes.io/name
                  operator: In
                  values:
                    - appA
            topologyKey: kubernetes.io/hostname

As you know, sometimes app A will drift to other nodes (karpenter is auto scaling) but app B won't follow. The affinity is broken.

What I am currently doing in predicates is:

func CheckPodAffinityViolation(candidatePod *v1.Pod, assignedPods map[string][]*v1.Pod, nodeMap map[string]*v1.Node) bool {
    nodeHavingCandidatePod, ok := nodeMap[candidatePod.Spec.NodeName]
    if !ok {
        klog.Warningf("CandidatePod %s does not exist in nodeMap", klog.KObj(candidatePod))
        return false
    }

    affinity := candidatePod.Spec.Affinity
    if affinity == nil || affinity.PodAffinity == nil {
        return false
    }

    for _, term := range GetPodAffinityTerms(affinity.PodAffinity) {
        namespaces := GetNamespacesFromPodAffinityTerm(candidatePod, &term)
        selector, err := metav1.LabelSelectorAsSelector(term.LabelSelector)
        if err != nil {
            klog.ErrorS(err, "Unable to convert LabelSelector into Selector")
            return false
        }

        for namespace := range namespaces {
            for _, assignedPod := range assignedPods[namespace] {
                if assignedPod.Name == candidatePod.Name || !PodMatchesTermsNamespaceAndSelector(assignedPod, namespaces, selector) {
                    klog.V(4).InfoS("CandidatePod does not match inter-pod affinity rule of assigned pod on node", "candidatePod", klog.KObj(candidatePod), "assignedPod", klog.KObj(assignedPod))
                    continue
                }

                nodeHavingAssignedPod, ok := nodeMap[assignedPod.Spec.NodeName]
                if !ok {
                    continue
                }

                if hasSameLabelValue(nodeHavingCandidatePod, nodeHavingAssignedPod, term.TopologyKey) {
                    klog.V(1).InfoS("CandidatePod matches inter-pod affinity rule of assigned pod on node", "candidatePod", klog.KObj(candidatePod), "assignedPod", klog.KObj(assignedPod))
                    return false
                }
            }
        }
    }
    klog.V(1).Infof("CandidatePod did not match inter-pod affinity in node %v", candidatePod.Spec.NodeName)
    return true
}

So when the candidate pod are in the same node (since that's the topology key), just don't evict it. I think I'm missing some parts of the problem or not following the correct approach.

Any pointers would be appreciated 😄