kubeflow / mpi-operator

Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
https://www.kubeflow.org/docs/components/training/mpi/
Apache License 2.0
420 stars 211 forks source link

Add suspend semantics #504

Closed alculquicondor closed 1 year ago

alculquicondor commented 1 year ago

The semantics should be similar to that of k8s Job.

And this will pave the work for the training-operator (https://github.com/kubeflow/training-operator/issues/1519)

tenzen-y commented 1 year ago

/kind feature

mimowo commented 1 year ago

/assign

tenzen-y commented 1 year ago

@mimowo We have completed upgrading the kubernetes dependencies in #502. It may help you implement suspend semantics.

mimowo commented 1 year ago

@tenzen-y @alculquicondor you may want to look at the WIP implementation (tested manually) here: https://github.com/kubeflow/mpi-operator/pull/511. Any early feedback is welcome.

tenzen-y commented 1 year ago

Probably, we can close this issue.

alculquicondor commented 1 year ago

/close

google-oss-prow[bot] commented 1 year ago

@alculquicondor: Closing this issue.

In response to [this](https://github.com/kubeflow/mpi-operator/issues/504#issuecomment-1416485307): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.