Open terrytangyuan opened 2 years ago
@alculquicondor What is the status for MPI Operator v2 ? Do we have plans to deliver MPI Operator v2 as part of Universal Training Operator in Kubeflow 1.5 ? The Kubeflow 1.5 release deadline is January 15th.
We need a contributor to do it. I don't currently have capacity to handle it. That means that likely it wouldn't be possible for January 15th. But I don't think the v1 operator is ready either.
cc @ArangoGutierrez
I want to resurrect this thread. There have been many asks from the community to have v2 mpi operator in training operator. Currently, newer features are merged into v2 mpi. Time have passed since the last discussion and v2 api is stable now. What is our plan here regarding migration? What are the road blocks here? There is confusion in the community the future of v1 mpi as well.
Can we prioritise this? @alculquicondor @terrytangyuan @tenzen-y
IIRC, we are planning to donate mpi-operator v2 to kubernetes-sigs. So we should decide whether donate to the kubernetes-sigs or merge the v2 operator to the training-operator, to avoid double management.
https://github.com/kubeflow/community/pull/557
cc: @ArangoGutierrez @denkensk @ahg-g
Do we have any new plan here ? Since donate mpi-operator v2 to kubernetes-sigs
is seems aborted, should we merge mpi-operator v2 to training-operator ?
There's also discussion around donating Spark-on-K8s project to Kubeflow (no open issue yet since we are still waiting for governance update). I personally think that project is similar to MPI Operator which not just focus on training. So I am not sure if MPI Operator would be a good fit for training-operator.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Now that v1 MPI operator has been migrated to this repo https://github.com/kubeflow/training-operator/pull/1457. Let's use this issue to track the progress on v2.
https://github.com/kubeflow/mpi-operator/tree/master/v2
cc @hackerboy01 @zw0610 @alculquicondor @kubeflow/wg-training-leads