KEP-2170: Add TrainJob conditions

tenzen-y commented 2 weeks ago

What this PR does / why we need it: I implemented the TrainJob condition mechanism based on https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#state-transition

However, the current implementation depends on the JobSet status.conditions as opposed to the status.terminalState since the terminalState was introduced in JobSet v0.6, then the JobSet depends on the K8s lib. After we upgrade the training-operator dep version to 1.30 in https://github.com/kubeflow/training-operator/pull/2299, we can rely on the termonalState.

So, after we upgrade the K8s libs to 1.30, we can revisit the JobSet status.terminalState.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged): Part-of: https://github.com/kubeflow/training-operator/issues/2207 Relates to #2170

Checklist:

[ ] Docs included if any changes are user facing

coveralls commented 2 weeks ago

Pull Request Test Coverage Report for Build 11754225694

Details

1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 100.0%

Totals
Change from base Build 11663764609:	0.0%
Covered Lines:	77
Relevant Lines:	77

💛 - Coveralls

tenzen-y commented 2 weeks ago

/hold for review

tenzen-y commented 2 weeks ago

/assign @kubeflow/wg-training-leads

tenzen-y commented 2 weeks ago

@andreyvelich I addressed all comments. PTAL, thanks!

andreyvelich commented 2 weeks ago

Thanks @tenzen-y! /lgtm /approve /hold

Feel free to merge it.

google-oss-prow[bot] commented 2 weeks ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/training-operator/blob/master/OWNERS)~~ [andreyvelich] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

tenzen-y commented 2 weeks ago

Thank you for the review! /hold cancel

kubeflow / training-operator