Open andreyvelich opened 2 months ago
Related: https://github.com/kubeflow/training-operator/issues/2170
We should create ClusterTrainingRuntime for PyTorch multi-node distributed training.
ClusterTrainingRuntime
/area runtime
I'm learning training-operator v1, I want to work for this issue. Please give me some suggestions.
/assign
Related: https://github.com/kubeflow/training-operator/issues/2170
We should create
ClusterTrainingRuntime
for PyTorch multi-node distributed training./area runtime