Open andreyvelich opened 2 months ago
As we discussed during the last Training WG call, we want to design and implement Training Runtime for Slurm, so users can leverage Slurm workload manager for model training on Kubernetes.
Recordings: https://youtu.be/IBDyYUbB0UA
We can continue discussions once we implement the Training Operator V2 APIs.
cc @kubeflow/wg-training-leads @catblade
/area runtime
Give it a 👍 We prioritize the features with most 👍
/remove-label lifecycle/needs-triage
What you would like to be added?
As we discussed during the last Training WG call, we want to design and implement Training Runtime for Slurm, so users can leverage Slurm workload manager for model training on Kubernetes.
Recordings: https://youtu.be/IBDyYUbB0UA
We can continue discussions once we implement the Training Operator V2 APIs.
cc @kubeflow/wg-training-leads @catblade
/area runtime
Love this feature?
Give it a 👍 We prioritize the features with most 👍