Open tuxillo opened 4 months ago
Any news?
Hi @tuxillo, sorry but unfortunately we don't have much bandwidth for adding new plugins at this time. The descheduling framework refactor is meant to make it easier for anyone to develop their own plugins. The existing plugin code should serve as a guide, but we hope to publish a more generic developer guide in the future.
@damemi thanks for the update. No worries! I was wondering if this plugin was missing for any particular reason. I might give it a try at implementing it but, should I wait for the refactor you're mentioning?
@tuxillo please feel free to implement it and open a PR if you'd like! the refactor is done already. Ideally, the goal of this was to make it so you don't even need to contribute the plugin code here if you don't want to and could just build your own descheduler with your plugin imported.
I did an initial version using RemovePodsViolatingInterPodAntiAffinity
as the base, but as it turns out I ended up just reversing the logic of it, which is most certainly the wrong approach. And also sounds like not worth of a whole plugin.
Let me explain my use case:
I set application A to run in certain nodes. I want application B to always run in nodes where application A is, so for application B I use:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- appA
topologyKey: kubernetes.io/hostname
As you know, sometimes app A will drift to other nodes (karpenter is auto scaling) but app B won't follow. The affinity is broken.
What I am currently doing in predicates is:
func CheckPodAffinityViolation(candidatePod *v1.Pod, assignedPods map[string][]*v1.Pod, nodeMap map[string]*v1.Node) bool {
nodeHavingCandidatePod, ok := nodeMap[candidatePod.Spec.NodeName]
if !ok {
klog.Warningf("CandidatePod %s does not exist in nodeMap", klog.KObj(candidatePod))
return false
}
affinity := candidatePod.Spec.Affinity
if affinity == nil || affinity.PodAffinity == nil {
return false
}
for _, term := range GetPodAffinityTerms(affinity.PodAffinity) {
namespaces := GetNamespacesFromPodAffinityTerm(candidatePod, &term)
selector, err := metav1.LabelSelectorAsSelector(term.LabelSelector)
if err != nil {
klog.ErrorS(err, "Unable to convert LabelSelector into Selector")
return false
}
for namespace := range namespaces {
for _, assignedPod := range assignedPods[namespace] {
if assignedPod.Name == candidatePod.Name || !PodMatchesTermsNamespaceAndSelector(assignedPod, namespaces, selector) {
klog.V(4).InfoS("CandidatePod does not match inter-pod affinity rule of assigned pod on node", "candidatePod", klog.KObj(candidatePod), "assignedPod", klog.KObj(assignedPod))
continue
}
nodeHavingAssignedPod, ok := nodeMap[assignedPod.Spec.NodeName]
if !ok {
continue
}
if hasSameLabelValue(nodeHavingCandidatePod, nodeHavingAssignedPod, term.TopologyKey) {
klog.V(1).InfoS("CandidatePod matches inter-pod affinity rule of assigned pod on node", "candidatePod", klog.KObj(candidatePod), "assignedPod", klog.KObj(assignedPod))
return false
}
}
}
}
klog.V(1).Infof("CandidatePod did not match inter-pod affinity in node %v", candidatePod.Spec.NodeName)
return true
}
So when the candidate pod are in the same node (since that's the topology key), just don't evict it. I think I'm missing some parts of the problem or not following the correct approach.
Any pointers would be appreciated 😄
Is your feature request related to a problem? Please describe.
I think it is a problem because, as far as I know, there is no way of rescheduling pods that are in violation of podAffinity.
Describe the solution you'd like
Make a plugin available that deals with this, right? 🥲
Describe alternatives you've considered Right now I'm manually doing rollout in deployments so that the scheduling is right, for example after ugprades.
What version of descheduler are you using?
descheduler version: latest Additional context
Issue #1286 was closed as "Not planned" but feedback was provided and I think it's a feature that's pretty much needed.