kubeflow / training-operator

Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.61k stars 700 forks source link

KEP-2170: Create controller for TrainJob #2207

Closed andreyvelich closed 3 days ago

andreyvelich commented 3 months ago

Related: https://github.com/kubeflow/training-operator/issues/2170

We need to create controller for TrainJob to reconcile the appropriate resources.

/area controller

tenzen-y commented 3 months ago

/assign

github-actions[bot] commented 6 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

andreyvelich commented 3 days ago

We can close it since an initial version of TrainJob controller is working 🎉