Possible support for evicting pending pods that are stuck.

kannon92 commented 1 year ago

I have a question if descheduler would be a good place to add removing pending pods if they are stuck.

I work on a batch project and we experience a lot of cases where pods can get stuck due to configuration errors. I originally posted a github issue on k/k hoping we could evict these pods in k/k. I was curious if this could be in scope for descheduler.

Some context for the reader

https://github.com/kubernetes/enhancements/pull/3816 - KEP for adding a standard condition for stuck pods - Need to get back to this but summarizes a lot of common cases where users can get stuck pods
https://github.com/kubernetes/kubernetes/issues/113211#issuecomment-1599013555

Generally I am working on a KEP to represent pods that are stuck due to configuration issues. And I would also like to consider options for how to evict these pods. The main complication is that false conditions can be BAU so we were thinking we would want a timeout and eventually evict if the condition matches a bad state for x amount of time.

For descheduler, I just want to know if this is possible in scope as a feature ask?

damemi commented 1 year ago

We have PodLifetime, which considers pending pods, but only if they have already been scheduled (see https://github.com/kubernetes-sigs/descheduler/issues/858 and https://github.com/kubernetes-sigs/descheduler/pull/846#discussion_r899217024).

I think there have been other similar request to evict non-scheduled pods. But I've held the opinion that it's not really de"scheduling" if the pod isn't scheduled to a node in the first place.

Our code right now basically only looks at pods that are already on a node, as far as I recall (@a7i @ingvagabund has this changed since your thread I linked above?). We could update that to consider all pods, at least for some strategies, which I think would be easier now that we have the descheduler framework in place.

I think there is still merit to the original proposals you linked, and it would be great if there was a standard condition the descheduler could rely on. The scheduler should also take some action to indicate the pod has been failed and remove it from the scheduling queue.

a7i commented 1 year ago

While Descheduler supports Pending pods, there are 2 things to consider:

Some strategies query pods via "ListPodsOnNodes" which only takes pods that have been scheduled. This is true for PodLifeTime as well (ref). In other words, if a pod is pending and unschedulable, then most strategies will not work for this. Although, this may be ok for pods stuck due to configuration errors.
Descheduler uses the eviction API so if you have two many pods with configuration errors, then the workload is already disrupted and Descheduler will not be able to correct it. See Issue #1170

ingvagabund commented 1 year ago

@kannon92 with introduction of descheduling plugins one can always create a custom plugin for any possible scenario. Even including cases where a pod is not yet scheduled but is expected to be "evicted" (instead of descheduled). Among other reasons we designed and created the descheduling framework to avoid making decisions whether a new scenario can be handled by the descheduler or whether a different component is more preferable. So we can focus more on the mechanics rather than (new) policies.

Quickly reading https://github.com/kubernetes/enhancements/pull/3816 all the mentioned configuration errors are exposed after the kubelet tries to start a container (please prove me wrong). When it comes to evicting pods that are not yet scheduled (as mentioned in https://github.com/kubernetes/kubernetes/issues/113211#issuecomment-1599013555) we need to keep in mind every component has its own responsibilities and ownership of part of a pod lifecycle. The scheduler is responsible for assigning a node to a pod, the kubelet for running a pod, descheduler for evicting a running pod. As @a7i mentioned we have PodLifeTime strategy which could be utilized for the case where a pod is in e.g. FailingToStart state or other for some time. However, if a pod fails to start for a configuration error reason, the corresponding pod spec needs to be updated. Or, a missing secret/configmap needs to be created. Evicting such a pod will not mitigate cause of the configuration error. That's up to a different component (e.g. controllers). So ultimately the descheduler will only "clean" all the broken pods. The descheduler is more interested in cases when the eviction itself will resolve the underlying cause. E.g. moving a pod to a different node where a networking is less broken, a node has more resources to avoid OOM, etc.

kannon92 commented 10 months ago

/cc @alculquicondor

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 5 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/descheduler/issues/1183#issuecomment-2106040918): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

ingvagabund commented 5 months ago

@kannon92 do you still plan to explore this feature?

kannon92 commented 3 months ago

I’m not sure if I’ll get to this. Can we keep it open? There is still interest in pending pod handling and I know @alculquicondor was looking at this at one point as a workaround for some upstream issues around pods being stuck in pending.

kubernetes-sigs / descheduler

Possible support for evicting pending pods that are stuck. #1183