Open andreyvelich opened 3 months ago
Note that we need to extend the KEP-2170 for the MPI before we implement anything.
Note that we need to extend the KEP-2170 for the MPI before we implement anything.
Oh, we already added the design for the MPI here: https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#the-mpi-spec-api
NVM
Note that we need to extend the KEP-2170 for the MPI before we implement anything.
Oh, we already added the design for the MPI here: https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#the-mpi-spec-api
NVM
Once we will be ready to implement MPI runtime, we should probably update this ClusterTrainingRuntime
: https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#mpi-runtime.
It might have incorrect values, since we didn't get a chance to finalize it.
Note that we need to extend the KEP-2170 for the MPI before we implement anything.
Oh, we already added the design for the MPI here: https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#the-mpi-spec-api NVM
Once we will be ready to implement MPI runtime, we should probably update this
ClusterTrainingRuntime
: https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#mpi-runtime.It might have incorrect values, since we didn't get a chance to finalize it.
That sounds good to me.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/remove-lifecycle stale
Related: https://github.com/kubeflow/training-operator/issues/2170
As part of this KEP, we will migrate to the MPI V2 implementation.
We should add support for the MPI Runtime.
/area runtime