Open fanhaouu opened 3 months ago
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign knelasevero for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Hi @fanhaouu. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
@fanhaouu can you provide an example of the "eternal" failure? Is there a reproducer for this failure? Is there an issue that can help us to see how the failure affects the descheduler, resp. users?
@fanhaouu can you provide an example of the "eternal" failure? Is there a reproducer for this failure? Is there an issue that can help us to see how the failure affects the descheduler, resp. users?
The current various descheduling strategies all use the NodeLimitExceeded
method in PodEvictor
to determine in advance whether the next eviction operation is necessary. This ensures that the condition
pe.namespacePodCount[pod.Namespace]+1 > *pe.maxPodsToEvictPerNamespace
will never occur.
For example: https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/podlifetime/pod_lifetime.go#L135 https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/removeduplicates/removeduplicates.go#L214 https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/removefailedpods/failedpods.go#L107
Now every strategy incorporates the 'Evictor().NodeLimitExceeded' logic, which leads to an eternal failure to meet the logic in the 'EvictPod' method, specifically 'pe.nodepodCount[pod.Spec.NodeName]+1 > *pe.maxPodsToEvictPerNode'. Consequently, this prevents the exposure of node-reached metrics and the evaluation of reasons for not evicting subsequent pods.
This pr can resolve that.