kubeflow / mpi-operator

Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
https://www.kubeflow.org/docs/components/training/mpi/
Apache License 2.0
430 stars 216 forks source link

Consider handling the minResources when using volcano as a gang scheduler #535

Closed tenzen-y closed 1 year ago

tenzen-y commented 1 year ago

Since #520, the mpi-operator respects the .spec.runPolicy.schedulingPolicy when creating the PodGroup. Currently, the mpi-operator just passes the .spec.runPolicy.schedulingPolicy.minResources to the .spec.minResources in PodGroup when using volcano as a gang scheduler.

However, we may want to calculate all required resources in Launcher and Workers considering priorityClasses and then pass it to .spec.minResources in PodGroup like the following:

https://github.com/kubeflow/mpi-operator/blob/2bc2b6500cd1f5e502b366ad8b529d9324bd63f2/pkg/controller/podgroup.go#L89-L168

/help

google-oss-prow[bot] commented 1 year ago

@tenzen-y: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubeflow/mpi-operator/issues/535): >Since #520, the mpi-operator respects the `.spec.runPolicy.schedulingPolicy` when creating the PodGroup. >Currently, the mpi-operator just passes the `.spec.runPolicy.schedulingPolicy.minResources` to the `.spec.minResources` in PodGroup when using volcano as a gang scheduler. > >However, we may want to calculate all required resources in Launcher and Workers considering priorityClasses and then pass it to `.spec.minResources` in PodGroup like the following: > >https://github.com/kubeflow/mpi-operator/blob/2bc2b6500cd1f5e502b366ad8b529d9324bd63f2/pkg/controller/podgroup.go#L89-L168 > >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
alculquicondor commented 1 year ago

To clarify, we are looking for help from folks that have experience with volcano.

lowang-bh commented 1 year ago

/assign