kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.51k stars 675 forks source link

Strategy For Pod Life Time #205

Closed seanmalloy closed 4 years ago

seanmalloy commented 4 years ago

This is an enhancement proposal to add a new descheduler strategy for evicting old pods. I would like to be able to evict pods that were created more than X hours ago.

Here are my initial thoughts on the API for a new PodLifeTime strategy. In this example the descheduler would evict all pods that were created more than 24 hours ago.

kind: "DeschedulerPolicy"
apiVersion: "descheduler/v1alpha1"
strategies:
  "PodLifeTime":
     enabled: true
     params:
        maxPodLifeTimeHours: 24
seanmalloy commented 4 years ago

Any feedback on this proposal would be greatly appreciated.

CC: @sanbornick @vinny-sabatini

seanmalloy commented 4 years ago

Here is a slightly different proposal, specify the max pod life time in minutes instead of hours.

kind: "DeschedulerPolicy"
apiVersion: "descheduler/v1alpha1"
strategies:
  "PodLifeTime":
     enabled: true
     params:
        maxPodLifeTimeMinutes: 1440   # 24 hours
mcandre commented 4 years ago

Don't allow users to omit units: Accept a Go-compatible time duration string.

https://golang.org/pkg/time/#ParseDuration

seanmalloy commented 4 years ago

@ravisantoshgudimetla @aveshagarwal @damemi do you have any thoughts on this proposal? Does this sound reasonable?

damemi commented 4 years ago

Not sure if this applies here, but I think Kubernetes convention prefers having the units in the field name (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#units), which also makes it easier for non-go clients to parse the API.

If that's the case then I agree with the smaller units (ie Minutes instead of Hours, or maybe even Seconds)

Edit: actually that section specifically says durations should be handled with the units in the field name:

Duration fields must be represented as integer fields with units being part of the field name (e.g. leaseDurationSeconds). We don't use Duration in the API since that would require clients to implement go-compatible parsing.

For that reason I would vote for podLifetimeSeconds as the field name

seanmalloy commented 4 years ago

/kind feature

seanmalloy commented 4 years ago

Based on the feedback from @damemi here is the updated proposal.

kind: "DeschedulerPolicy"
apiVersion: "descheduler/v1alpha1"
strategies:
  "PodLifeTime":
     enabled: true
     params:
        maxPodLifeTimeSeconds: 86400   # 24 hours
seanmalloy commented 4 years ago

/assign @seanmalloy

seanmalloy commented 4 years ago

@alculquicondor during the SIG scheduling meeting you mentioned there was a k8s API related to pod evictions that might be to relevant to this feature. This was something different than pod disruption budgets. If you find the documentation that would be super useful. Thanks!

alculquicondor commented 4 years ago

Other than disruption budget, there is a "safe-to-evict" label. However, it's namespaced under cluster-autoscaler.

https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node

I'm not sure if the descheduler should be respecting that. But I wonder what does this label offer that is not possible through disruption budget. @maciekpytel?

In a totally different note, shouldn't the strategy have a way to filter, like with a Label Selector? Unless no other strategies support this currently?

MaciekPytel commented 4 years ago

This annotation takes two values:

Overall, this label is very specific to autoscaler, so I'm not sure if it makes much sense to respect it in other components. Also - do we even expect users to use descheduler together with CA? They seem to conflict in what they try to achieve - CA explicitly ignores pod scores (I guess that's how we call priorities now?) and tries to pack pods as tightly as possible whether they like it or not, while IIUC the goal of descheduler is to reschedule pods based on scores (so, spread them out).

That being said I wouldn't be surprised if at some point there is a FR to add similar annotation for descheduler.

seanmalloy commented 4 years ago

In a totally different note, shouldn't the strategy have a way to filter, like with a Label Selector? Unless no other strategies support this currently?

PR #202 has been opened to add a label selector feature to descheduler.

seanmalloy commented 4 years ago

That being said I wouldn't be surprised if at some point there is a FR to add similar annotation for descheduler.

The descheduler annotation descheduler.alpha.kubernetes.io/evict works very similar to this. It is not exactly the same, but pretty close.

seanmalloy commented 4 years ago

/close

Pull request #274 was merged into the master branch.

k8s-ci-robot commented 4 years ago

@seanmalloy: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/descheduler/issues/205#issuecomment-625844649): >/close > >Pull request #274 was merged into the master branch. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.