kubeflow / training-operator

Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.58k stars 687 forks source link

KEP-2170: Make API specification more restricting #2198

Closed tenzen-y closed 1 month ago

tenzen-y commented 1 month ago

What this PR does / why we need it: I updated the below API definitions for the Training v2 API:

  1. trainingRuntimeRef
    1. I removed the namespace field since we should use the TrainingRuntime deployed in the same namespace with the TrainJob. Otherwise, users should define the ClusterTrainingRuntime instead of the namespace-scoped TrainingRuntime resource.
    2. I renamed the version field with apiVersion since we are sometimes confused if the version indicates the runtime revision or API version.
  2. managedBy
    1. I added limitations and restrictions based on discussions in https://github.com/kubeflow/training-operator/issues/2193. The managedBy field is acceptable only for either an empty value, "training-operator.kubeflow.org/trainjob-controller" or "kueue.x-k8s.io/multikueue".

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged): Related: https://github.com/kubeflow/training-operator/issues/2170

Checklist:

tenzen-y commented 1 month ago

/assign @kubeflow/wg-training-leads

coveralls commented 1 month ago

Pull Request Test Coverage Report for Build 10307520952

Details


Totals Coverage Status
Change from base Build 10284610088: 0.008%
Covered Lines: 3951
Relevant Lines: 11796

💛 - Coveralls
tenzen-y commented 1 month ago

cc: @mimowo

google-oss-prow[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/training-operator/blob/master/OWNERS)~~ [andreyvelich] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment