kubeflow / mxnet-operator

A Kubernetes operator for mxnet jobs
Apache License 2.0
53 stars 34 forks source link

Fix the reconcile flow #74

Closed ChanYiLin closed 4 years ago

ChanYiLin commented 4 years ago

Follow the same fix in tf-operator https://github.com/kubeflow/tf-operator/pull/1111 . If the job has already terminated, we don't need to check activedeadline and backofflimit.

Originally, even the job has terminated it still checks the Activedeadline and appends the event to it. So the event that shows the job failed after it succeeded might happen because it pass Activedeadline, the log of operator will also keep showing the failure massage of passing Activedeadline.

kubeflow-bot commented 4 years ago

This change is Reviewable

ChanYiLin commented 4 years ago

/assign @gaocegege can you help to review it? Thanks!

k8s-ci-robot commented 4 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/mxnet-operator/blob/master/OWNERS)~~ [terrytangyuan] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment