kubeflow / common

Common APIs and libraries shared by other Kubeflow operator repositories.
Apache License 2.0
51 stars 73 forks source link

Support for PodGroup updates #207

Closed tenzen-y closed 1 year ago

tenzen-y commented 1 year ago

Signed-off-by: Yuki Iwai yuki.iwai.tz@gmail.com

Currently, when users update CustomJob resources (e.g, TFJob) with runPolicy.schedulingPolicy or replicas changes, traininig-operator can not update PodGroup.

This means when hpa-controller updates replicas, training-operator does not update PodGroup.

So I implemented the logic to update PodGroup.

/assign @zw0610 @johnugeorge @terrytangyuan

tenzen-y commented 1 year ago

@johnugeorge Can we wait to create a new kubeflow/common release until this PR is merged?

zw0610 commented 1 year ago

This means when hpa-controller updates replcase, training-operator does not update PodGroup.

are you referring replicas?

tenzen-y commented 1 year ago

This means when hpa-controller updates replcase, training-operator does not update PodGroup.

are you referring replicas?

Ah, yes. This is a typo.

johnugeorge commented 1 year ago

/lgtm

google-oss-prow[bot] commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/common/blob/master/OWNERS)~~ [terrytangyuan] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment