pravega / pravega-operator

Pravega Kubernetes Operator
Apache License 2.0
41 stars 38 forks source link

Leader election fails after node reboot #579

Closed anishakj closed 3 years ago

anishakj commented 3 years ago

Description

In VMware cluster, some pods are stuck in ProviderFailed state, and leader election function, provided by operator SDK, is unable to process that, so new pods are stuck in wait cycle.

Importance

must-have

Location

cmd/manager/main.go

Suggestions for an improvement

Customize the leader.Become function of operator-sdk to include pre-checks