feat: expose node reached metric

kubernetes-sigs / descheduler

Descheduler for Kubernetes

https://sigs.k8s.io/descheduler

Apache License 2.0

4.23k stars 645 forks source link

feat: expose node reached metric #1368

Open fanhaouu opened 3 months ago

fanhaouu commented 3 months ago

Now every strategy incorporates the 'Evictor().NodeLimitExceeded' logic, which leads to an eternal failure to meet the logic in the 'EvictPod' method, specifically 'pe.nodepodCount[pod.Spec.NodeName]+1 > *pe.maxPodsToEvictPerNode'. Consequently, this prevents the exposure of node-reached metrics and the evaluation of reasons for not evicting subsequent pods.

This pr can resolve that.

linux-foundation-easycla[bot] commented 3 months ago

The committers listed above are authorized under a signed CLA.

:white_check_mark: login: fanhaouu / name: Hao Fan (da24e976f3e1f26dc7d9213ecddaad8048a13b1a)

k8s-ci-robot commented 3 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign knelasevero for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubernetes-sigs/descheduler/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

k8s-ci-robot commented 3 months ago

Hi @fanhaouu. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

ingvagabund commented 1 month ago

@fanhaouu can you provide an example of the "eternal" failure? Is there a reproducer for this failure? Is there an issue that can help us to see how the failure affects the descheduler, resp. users?

fanhaouu commented 1 month ago

@fanhaouu can you provide an example of the "eternal" failure? Is there a reproducer for this failure? Is there an issue that can help us to see how the failure affects the descheduler, resp. users?

The current various descheduling strategies all use the NodeLimitExceeded method in PodEvictor to determine in advance whether the next eviction operation is necessary. This ensures that the condition

pe.namespacePodCount[pod.Namespace]+1 > *pe.maxPodsToEvictPerNamespace

will never occur.

For example: https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/podlifetime/pod_lifetime.go#L135 https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/removeduplicates/removeduplicates.go#L214 https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/removefailedpods/failedpods.go#L107