Closed invidian closed 3 years ago
Well, the dummy fix is the following:
diff --git metal/cloud.go metal/cloud.go
index 6902b27..4fae75c 100644
--- metal/cloud.go
+++ metal/cloud.go
@@ -107,8 +107,7 @@ func (c *cloud) Initialize(clientBuilder cloudprovider.ControllerClientBuilder,
serviceReconcilers := []serviceReconciler{}
for _, elm := range c.services() {
if err := elm.init(clientset); err != nil {
- klog.Errorf("could not initialize %s: %v", elm.name(), err)
- return
+ klog.Fatalf("could not initialize %s: %v", elm.name(), err)
}
if n := elm.nodeReconciler(); n != nil {
nodeReconcilers = append(nodeReconcilers, n)
I think in the long term, reconciler should have access to the namespace informer and fetch it each on each reconcilation loop. This should simplify the whole process.
Part of the problem is that Initialize(), which passed the ControllerClientBuilder
, does not return any error. Given that most do ClientOrDie()
, I am thinking that your Fatalf()
makes sense. In any case, this runs as a Deployment
, so it would get restarted by Kubernetes.
For now, please open a PR with the above and I will merge it in.
Thanks, @invidian. Closing this.
@displague why closing this? The PR I created is barely a workaround IMO. I think code should be restructured, so this operation can be retried and also to support other controllers retry the initialization process.
If API is unavailable during setup and for example we fail to fetch
kube-apiserver
namespace like in logs below, then CCM is running, but no Service will get IP address assigned until CCM is restarted.Setup should either be retried or process should exit to avoid getting stuck in broken state.
Logs below confirm it:
After re-creating CCM pod, Services correctly gets IP assigned.