kubeflow / common

Common APIs and libraries shared by other Kubeflow operator repositories.
Apache License 2.0
51 stars 73 forks source link

Handle pod failures for all policies #188

Closed georgkaleido closed 2 years ago

georgkaleido commented 2 years ago

If a pod is in phase failure we have to create a new one. Currently it was assumed the pod would restart due to a RestartPolicy on the pod level This doesn't work if the pod fails for a system reason.

google-cla[bot] commented 2 years ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

google-oss-prow[bot] commented 2 years ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign jeffwan after the PR has been reviewed. You can assign the PR to them by writing /assign @jeffwan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubeflow/common/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
georgkaleido commented 2 years ago

Closing. Reopening with different email for CLA