pravega / pravega-operator

Pravega Kubernetes Operator
Apache License 2.0
41 stars 38 forks source link

Issue 579: Leader election fails after node reboot #580

Closed anishakj closed 3 years ago

anishakj commented 3 years ago

Change log description

In VMware cluster, some pods are stuck in ProviderFailed state, and leader election function, provided by operator SDK, is unable to process that, so new pods are stuck in wait cycle.

Purpose of the change

Fixes #579

What the code does

Customise the leader.Become() function of operator-sdk and if the pod is in ProviderFailed state, delete the pod and configmap so that new pod can come up.

How to verify it

Verify in Vmware setup pods are coming up successfully after node reboot.

anishakj commented 3 years ago

Closing this PR, as the commits are not top of master