kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
111.11k stars 39.67k forks source link

In Scheduler Preemption phase, RunFilterPlugins does not consider extender scheduler plugin #105720

Closed lianghao208 closed 3 years ago

lianghao208 commented 3 years ago

What happened?

related code is default_preemption.go#L177-L185, when selecting nominated node for failed scheduled pods, it only runs RunFilterPluginsWithNominatedPods to select nominated nodes from default scheduler. default_preemption.go#L183-L184

    if status := pl.fh.RunFilterPluginsWithNominatedPods(ctx, state, pod, nodeInfo); !status.IsSuccess() {
        return nil, 0, status
    }

If nominated nodes includes those who can't pass extender filter plugins, pods are likely to be set spec.NominatedNodeName to a node that can't meet the extender filter requirements even if the lower priority pods are deleted from nodes.

What did you expect to happen?

I think it needs findNodesThatPassExtenders in preemption phase when selecting nominated nodes

generic_scheduler.go#L340-L391

func findNodesThatPassExtenders(extenders []framework.Extender, pod *v1.Pod, feasibleNodes []*v1.Node, statuses framework.NodeToStatusMap) ([]*v1.Node, error) {
    // Extenders are called sequentially.
    // Nodes in original feasibleNodes can be excluded in one extender, and pass on to the next
    // extender in a decreasing manner.
        ...
}

to make sure nominated nodes can pass extender filter plugins.

How can we reproduce it (as minimally and precisely as possible)?

none

Anything else we need to know?

No response

Kubernetes version

```console $ kubectl version # paste output here ```

Cloud provider

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented 3 years ago

@lianghao208: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
lianghao208 commented 3 years ago

/sig scheduling

lianghao208 commented 3 years ago

/cc @Huang-Wei

sanposhiho commented 3 years ago

Hello. If I understand correctly, extenders are already called on here.

// callExtenders calls given to select the list of feasible candidates. // We will only check with extenders that support preemption. // Extenders which do not support preemption may later prevent preemptor from being scheduled on the nominated // node. In that case, scheduler will find a different host for the preemptor in subsequent scheduling cycles. https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/preemption/preemption.go#L228

And after callExtenders, we select nodes here, so a node which doesn't pass extenders is never selected as a nominated node.

lianghao208 commented 3 years ago

If I understand correctly, extenders are already called on here.

@sanposhiho Thanks for the reply, I think the callExtenders function will only be call when extenders support preemption. If the extenders only support filter verb, the extender filters won't be called to filter out the nominated nodes in preemption.

// callExtenders calls given <extenders> to select the list of feasible candidates.
// We will only check <candidates> with extenders that support preemption.
// Extenders which do not support preemption may later prevent preemptor from being scheduled on the nominated
// node. In that case, scheduler will find a different host for the preemptor in subsequent scheduling cycles.
func (ev *Evaluator) callExtenders(pod *v1.Pod, candidates []Candidate) ([]Candidate, *framework.Status) {
    extenders := ev.Handler.Extenders()
    nodeLister := ev.Handler.SnapshotSharedLister().NodeInfos()
    if len(extenders) == 0 {
        return candidates, nil
    }
...

I just realized that I misunderstood the preemption logic. Only extender preemption can be considered in preemption, not extender filter. I will close this issue. Thanks!