Closed tenzen-y closed 1 year ago
Can you summarize the changes for somebody that isn't familiar with volcano? :sweat_smile:
Can you summarize the changes for somebody that isn't familiar with volcano? 😅
Sure. I will send a ping to you once this PR description is ready.
@alculquicondor Updated PR description.
@alculquicondor I have addressed your comments. Please take another look.
@alculquicondor Updated. PTAL.
@alculquicondor I addressed your suggestions and squashed commits into one.
Also, I will add docs for the schedulingPolicy to https://www.kubeflow.org/.
/lgtm
@alculquicondor Created https://github.com/kubeflow/website/pull/3453.
@alculquicondor Do we have any blocking for merging?
oops, no, I just forgot to approve /approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: alculquicondor
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Signed-off-by: Yuki Iwai yuki.iwai.tz@gmail.com
I changed the logic for the gang-scheduling so that the mpi-operator respects SchedulingPolicy when creating PodGroup. Mainly, I modified the following:
~~3. Set "PodGroupSpec.MinResources". a. iff "SchedulingPolicy.MinAvailable" isn't empty, propagate that to PodGroup. b. In the case of
PodGroupSpec.MinMember < MPIJobSpec.MPIReplicaSpecs[Worker].Replicas + 1
, sort in descending order "MPIJobSpec.MPIReplicaSpecs" according to priorityClass, and then add container resources to "PodGroupSpec.MinResources". However, the total value of "MPIJobSpec.MPIReplicaSpec.Replicas" to be added must not exceed "PodGroupSpec.MinMember".~~Fixes: #518
/assign @alculquicondor