kubeflow / mpi-operator

Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
https://www.kubeflow.org/docs/components/training/mpi/
Apache License 2.0
420 stars 211 forks source link

Fix the logic to calculate minResources #543

Closed tenzen-y closed 1 year ago

tenzen-y commented 1 year ago

I fixed the logic to calculate minResources so that calculatePGMinResource treats the launcher as a replica of higher priority when we don't set priorityClasses.

I faced the issue at https://github.com/kubeflow/mpi-operator/pull/540#issuecomment-1496012813.

Background: In the current implementation, if the launcher and workers have the same priority, calculatePGMinResource randomly selects prioritized replicas. This means the launcher might be treated as a lower priority than the worker replica when we don't set priorityClass in both replicas.

tenzen-y commented 1 year ago

/assign @alculquicondor

tenzen-y commented 1 year ago

Rebased.

alculquicondor commented 1 year ago

/lgtm /approve

google-oss-prow[bot] commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/mpi-operator/blob/master/OWNERS)~~ [alculquicondor] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment