liqotech / liqo

Enable dynamic and seamless Kubernetes multi-cluster topologies
https://liqo.io
Apache License 2.0
1.11k stars 103 forks source link

liqo-fabric crashLoopBackOff when metrics.enabled=true #2715

Closed masterphenix closed 1 month ago

masterphenix commented 1 month ago

What happened:

After deploying liqo v1.0.0-rc.1 with the Helm Chart, the liqo-fabric pods are in crashLoopBackoff state ; logs show the following error:

I0903 11:35:55.555675       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="routeconfiguration" controllerGroup="networking.liqo.io" controllerKind="RouteConfiguration"
I0903 11:35:55.555692       1 controller.go:242] "All workers finished" controller="routeconfiguration" controllerGroup="networking.liqo.io" controllerKind="RouteConfiguration"
E0903 11:35:55.555521       1 kind.go:68] "failed to get informer from cache" err="Timeout: failed waiting for *v1beta1.InternalFabric Informer to sync" logger="controller-runtime.source.EventHandler"
I0903 11:35:55.555754       1 controller.go:242] "All workers finished" controller="pod" controllerGroup="" controllerKind="Pod"
I0903 11:35:55.555818       1 internal.go:526] "Stopping and waiting for caches"
I0903 11:35:55.556014       1 internal.go:530] "Stopping and waiting for webhooks"
I0903 11:35:55.556034       1 internal.go:533] "Stopping and waiting for HTTP servers"
I0903 11:35:55.556059       1 server.go:43] "shutting down server" kind="health probe" addr="[::]:8081"
I0903 11:35:55.556174       1 internal.go:537] "Wait completed, proceeding to shutdown the manager"
Error: failed to start metrics server: failed to create listener: listen tcp :8080: bind: address already in use

Usage:
  liqo-fabric [flags]
[...]

E0903 11:35:55.556744       1 main.go:79] failed to start metrics server: failed to create listener: listen tcp :8080: bind: address already in use

I've not seen any place to configure ports, are they hardcoded ?

What you expected to happen:

The liqo-fabric pods should start with no errors

How to reproduce it (as minimally and precisely as possible):

Deploy liqo helm chart v1.0.0-rc.1 on AKS cluster, with enable-ha and enable-metrics

Environment:

masterphenix commented 1 month ago

Hello, I can't explain why, but all pods were eventually running fine this morning. I have tried to reproduce the issue without success, so I'm closing this.