kubeflow / pytorch-operator

PyTorch on Kubernetes
Apache License 2.0
306 stars 143 forks source link

fix the reconcile flow #242

Closed ChanYiLin closed 4 years ago

ChanYiLin commented 4 years ago

Like the PR in the tf-operator https://github.com/kubeflow/tf-operator/pull/1111

This PR reorders the checking process so if the job has terminated(Succeed, Failed), it will return instead of further reconcile.

The ExceedsBackOffLimit and ActiveDeadLineSeconds should be handle afterward.

ChanYiLin commented 4 years ago

/assign @johnugeorge Can you help me to review it, thanks!

coveralls commented 4 years ago

Coverage Status

Coverage remained the same at 22.97% when pulling 306d6bf5188878267082180df6c54b7d8d9bae18 on ChanYiLin:master into 26f9d0ce381fe50926741b776e8a3f73bd77640d on kubeflow:master.

johnugeorge commented 4 years ago

Thanks @ChanYiLin

johnugeorge commented 4 years ago

/approve

k8s-ci-robot commented 4 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/pytorch-operator/blob/master/OWNERS)~~ [johnugeorge] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment