volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
4.13k stars 953 forks source link

Support Pod Scheduling Readiness in Volcano #3187

Closed william-wang closed 1 week ago

william-wang commented 11 months ago

What would you like to be added:

Pod Scheduling Readiness is a beta feature in Kubernetes v1.27. User expected Volcano to be aware it.

Why is this needed:

By specifying/removing a Pod's .spec.schedulingGates, user can control when a Pod is ready to be considered for scheduling. Here is a typical scenario from community user.

Problem Statement: We’ve implemented an external quota manager responsible for reviewing all incoming POD requests for capacity/quota requirements. Only once these requests receive approval from the quota manager are they considered eligible for scheduling. Consequently, we intend to leverage the pods schedulingGates feature to realize this functionality.

It seems that Volcano is not adhering to this feature. It proceeds to schedule pods without honoring schedulingGates flag. This behavior is in contrast to the default Kubernetes kube-scheduler, which functions as expected. Even though these pods have the schedulingGates added and were assigned to a node by the Volcano scheduler, it appears that they are subsequently rejected by the Kubernetes API server. I believe this error is occurring after the execution of the Volcano scheduler.

Failed to bind pod <namsepace/pod-name> to node : &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(int64)(nil)}, Status:"Failure", Message:"Operation cannot be fulfilled on pods/binding \”app-name\”: pod pod-name has non-empty .spec.schedulingGates", Reason:"Conflict", Details:(v1.StatusDetails)(0x140002188a0), Code:409}}

Please suggest how can we have this with volcano, If not can i add this feature in volcano and send the PR ? (edited)

Reference :

  1. https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/,
  2. https://kubernetes.io/blog/2022/12/26/pod-scheduling-readiness-alpha/#:~:text=Under%20the%20hood%2C%20scheduling%20gates,beginning%20of%20each%20scheduling%20cycle.

william-wang commented 11 months ago

@skalva404 Let's use this issue to track your requirement :)

william-wang commented 10 months ago

Is there anyone interested in contributing this feature?

itsaviral2609 commented 10 months ago

Hi @william-wang Currently new to Volcano Will like want to try out this issue! Any more details on the potential files which we have to look out for changes to be made!

Edit: https://github.com/volcano-sh/volcano/blob/master/pkg/scheduler/scheduler.go is this the one to be looked out for?

How it is originally implemented for Kubernetes!

https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/3521-pod-scheduling-readiness/README.md#implementation

Monokaix commented 10 months ago

Hi @william-wang Currently new to Volcano Will like want to try out this issue! Any more details on the potential files which we have to look out for changes to be made!

Edit: https://github.com/volcano-sh/volcano/blob/master/pkg/scheduler/scheduler.go is this the one to be looked out for?

How it is originally implemented for Kubernetes!

https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/3521-pod-scheduling-readiness/README.md#implementation

Hi, sorry for late. There is a simple way to implement this, add a function like taskScheduleGated and call it in allocate/backfill/preempt/reclaim action package to check whether task has been schedule gated, and those action will skip schedule task schedule gated. For more detail, you can refer to https://github.com/volcano-sh/volcano/commit/5feee2f904859baebf8834ff30e69100060b5ebd, the main difference is that it's a task granularity in our case instead of job.

skalva404 commented 9 months ago

I will go through it and confirm

Monokaix commented 8 months ago

Hi, any progress here? @itsaviral2609 @skalva404

Monokaix commented 1 week ago

3555 has completed it.

Monokaix commented 1 week ago

/close

volcano-sh-bot commented 1 week ago

@Monokaix: Closing this issue.

In response to [this](https://github.com/volcano-sh/volcano/issues/3187#issuecomment-2370378311): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.