kubeflow / common

Common APIs and libraries shared by other Kubeflow operator repositories.
Apache License 2.0
51 stars 73 forks source link

Handle pod failures for all policies #189

Closed georgkaleido closed 2 years ago

georgkaleido commented 2 years ago

If a pod is in phase failure we have to create a new one. Currently it was assumed the pod would restart due to a RestartPolicy on the pod level This doesn't work if the pod fails for a system reason.

google-cla[bot] commented 2 years ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

johnugeorge commented 2 years ago

can you do a rebase?

georgkaleido commented 2 years ago

@johnugeorge Done

johnugeorge commented 2 years ago

@georgkaleido Can you fix golint ?

georgkaleido commented 2 years ago

@johnugeorge done

google-oss-prow[bot] commented 2 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/common/blob/master/OWNERS)~~ [terrytangyuan] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment