kubeflow / common

Common APIs and libraries shared by other Kubeflow operator repositories.
Apache License 2.0
51 stars 73 forks source link

feat: Add successpolicy #181

Open gaocegege opened 2 years ago

gaocegege commented 2 years ago

SuccessPolicy is used in both PyTorchJob and TFJob. Thus I propose to add it in common.

Signed-off-by: cegao ce.gao@outlook.com

gaocegege commented 2 years ago

/assign @terrytangyuan @Jeffwan @zw0610

gaocegege commented 2 years ago

It is used in Katib. I think it works, but I think we should support successPolicy to keep API consistency.

cc @andreyvelich

andreyvelich commented 2 years ago

Yes, we are using successCondition with GSON format in our APIs to define condition for Katib Trial's Workers, similar to Argo Workflows. Probably, we can think how to use it in Training Operators.

terrytangyuan commented 2 years ago

Let's use https://github.com/kubeflow/training-operator/issues/1507 to track and discuss separately.

google-oss-prow[bot] commented 2 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/common/blob/master/OWNERS)~~ [terrytangyuan] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
gaocegege commented 2 years ago

/assign @zw0610 @Jeffwan

zw0610 commented 2 years ago

LGTM. Meanwhile, could you add descriptions for SchedulingPolicy and SuccessPolicyAllWorkers to explain the expected behavior?

gaocegege commented 2 years ago

SGTM

gaocegege commented 2 years ago

/hold