Open kannon92 opened 1 year ago
We have PodLifetime, which considers pending pods, but only if they have already been scheduled (see https://github.com/kubernetes-sigs/descheduler/issues/858 and https://github.com/kubernetes-sigs/descheduler/pull/846#discussion_r899217024).
I think there have been other similar request to evict non-scheduled pods. But I've held the opinion that it's not really de"scheduling" if the pod isn't scheduled to a node in the first place.
Our code right now basically only looks at pods that are already on a node, as far as I recall (@a7i @ingvagabund has this changed since your thread I linked above?). We could update that to consider all pods, at least for some strategies, which I think would be easier now that we have the descheduler framework in place.
I think there is still merit to the original proposals you linked, and it would be great if there was a standard condition the descheduler could rely on. The scheduler should also take some action to indicate the pod has been failed and remove it from the scheduling queue.
While Descheduler supports Pending pods, there are 2 things to consider:
PodLifeTime
as well (ref). In other words, if a pod is pending and unschedulable, then most strategies will not work for this. Although, this may be ok for pods stuck due to configuration errors.@kannon92 with introduction of descheduling plugins one can always create a custom plugin for any possible scenario. Even including cases where a pod is not yet scheduled but is expected to be "evicted" (instead of descheduled). Among other reasons we designed and created the descheduling framework to avoid making decisions whether a new scenario can be handled by the descheduler or whether a different component is more preferable. So we can focus more on the mechanics rather than (new) policies.
Quickly reading https://github.com/kubernetes/enhancements/pull/3816 all the mentioned configuration errors are exposed after the kubelet tries to start a container (please prove me wrong). When it comes to evicting pods that are not yet scheduled (as mentioned in https://github.com/kubernetes/kubernetes/issues/113211#issuecomment-1599013555) we need to keep in mind every component has its own responsibilities and ownership of part of a pod lifecycle. The scheduler is responsible for assigning a node to a pod, the kubelet for running a pod, descheduler for evicting a running pod. As @a7i mentioned we have PodLifeTime strategy which could be utilized for the case where a pod is in e.g. FailingToStart
state or other for some time. However, if a pod fails to start for a configuration error reason, the corresponding pod spec needs to be updated. Or, a missing secret/configmap needs to be created. Evicting such a pod will not mitigate cause of the configuration error. That's up to a different component (e.g. controllers). So ultimately the descheduler will only "clean" all the broken pods. The descheduler is more interested in cases when the eviction itself will resolve the underlying cause. E.g. moving a pod to a different node where a networking is less broken, a node has more resources to avoid OOM, etc.
/cc @alculquicondor
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
@kannon92 do you still plan to explore this feature?
I’m not sure if I’ll get to this. Can we keep it open? There is still interest in pending pod handling and I know @alculquicondor was looking at this at one point as a workaround for some upstream issues around pods being stuck in pending.
I have a question if
descheduler
would be a good place to add removing pending pods if they are stuck.I work on a batch project and we experience a lot of cases where pods can get stuck due to configuration errors. I originally posted a github issue on k/k hoping we could evict these pods in k/k. I was curious if this could be in scope for descheduler.
Some context for the reader
Generally I am working on a KEP to represent pods that are stuck due to configuration issues. And I would also like to consider options for how to evict these pods. The main complication is that false conditions can be BAU so we were thinking we would want a timeout and eventually evict if the condition matches a bad state for x amount of time.
For descheduler, I just want to know if this is possible in scope as a feature ask?