kubeflow / training-operator

Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.62k stars 700 forks source link

KEP-2170: Add manifests for Kubeflow Training V2 #2289

Closed andreyvelich closed 1 month ago

andreyvelich commented 1 month ago

Fixes: https://github.com/kubeflow/training-operator/issues/2208

I added manifests for Training Operator V2. I renamed the manager image to:

docker.io/kubeflow/training-operator-v2

That will allow users to use latest version for both V1 and V2 version of Training Operator. In the future, we can deprecate the old version of Training Operator.

For now, I install JobSet using the release manifests in the Kustomize overlay. Let's discuss with @kubeflow/wg-manifests-leads what is the better approach long-term.

Additionally, I fixed the invalid validating webhook configuration name in our cert generator.

/assign @kubeflow/wg-training-leads /hold for review

coveralls commented 1 month ago

Pull Request Test Coverage Report for Build 11413026138

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details


Totals Coverage Status
Change from base Build 11390874125: 0.0%
Covered Lines: 73
Relevant Lines: 73

💛 - Coveralls
google-oss-prow[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/training-operator/blob/master/OWNERS)~~ [tenzen-y] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
andreyvelich commented 1 month ago

/hold cancel