Closed logyball closed 3 months ago
If this is a desired feature, I would be happy to contribute it.
Duplicate/Related Issues and PRs:
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
Is your feature request related to a problem? Please describe.
Sometimes we have noticed descheduler getting into "loops", where it does not quite agree with the
kube-scheduler
. We use theHighNodeUtilization
profile + the GKE optimize-utilization scheduling profile to maximize utilization on our nodes. However, sometimes the descheduler evicts a pod, and it is rescheduled onto either the same node, or another node where it is evicted again. This is somewhat unavoidable during business conditions, and does not occur frequently enough to merit changing the thresholds or behavior of the descheduler.Describe the solution you'd like
It would be nice if we had a label on the prometheus metric that indicated which workload was being evicted in addition to namespace. This could be either the name of the pod or the name of the controller that owns the pod. If that were the case, we could develop observability or alerting around the workload being evicted repeatedly in a short time window.
One downside is increasing the cardinality of the metrics, but the amount of evictions is relatively low for us anyway, to the point that it doesn't seems like an "explosion", rather a linear scaling with the amount of evictions.
Describe alternatives you've considered
Changing configuration of descheduler, adding more strict per-namespace solutions.
What version of descheduler are you using?
descheduler version: v0.26.1