Open jaystary opened 2 years ago
We want to utilize multi node GPU scaling for PyTorch for a benchmark / potential larger scale model training
Implement KF Training Operator which as an added benefit should unlock all the relevant frameworks as well. https://github.com/kubeflow/training-operator
Excited about this feature? Give it a :thumbsup:. We factor engagement into prioritization.
The PyTorch operator resource can already be used in the current Kubeflow deployment, so there's no need to pause development while we update the deployment to the training operator.
Use Case
We want to utilize multi node GPU scaling for PyTorch for a benchmark / potential larger scale model training
Ideas of Implementation
Implement KF Training Operator which as an added benefit should unlock all the relevant frameworks as well. https://github.com/kubeflow/training-operator
Message from the maintainers:
Excited about this feature? Give it a :thumbsup:. We factor engagement into prioritization.