suxess-it / sx-cnp-oss

5 stars 0 forks source link

[loki] pods of daemonset sx-loki-logs restart permanently #348

Closed jkleinlercher closed 1 month ago

jkleinlercher commented 1 month ago
sx-loki-logs-8qxx7                                             0/2     Terminating   0          3s
sx-loki-logs-gb7wt                                             1/2     Terminating   0          3s
sx-loki-logs-phnv7                                             0/2     Terminating   0          2s
sx-loki-logs-zzgzf                                             0/2     Terminating   0          1s
kubectl get events -n monitoring | grep daemonset
22m         Normal    SuccessfulDelete        daemonset/sx-loki-logs                                              Deleted pod: sx-loki-logs-jdbcz
21m         Normal    SuccessfulCreate        daemonset/sx-loki-logs                                              Created pod: sx-loki-logs-2h7j4
21m         Normal    SuccessfulDelete        daemonset/sx-loki-logs                                              Deleted pod: sx-loki-logs-2h7j4
21m         Normal    SuccessfulCreate        daemonset/sx-loki-logs                                              Created pod: sx-loki-logs-g5fhd
21m         Normal    SuccessfulDelete        daemonset/sx-loki-logs                                              Deleted pod: sx-loki-logs-g5fhd
kubectl get events -n monitoring | grep sx-loki-logs
2m8s        Warning   Unhealthy               pod/sx-loki-logs-zzgzf                                              Readiness probe failed: Get "http://10.42.1.89:8080/-/ready": dial tcp 10.42.1.89:8080: connect: connection refused
jkleinlercher commented 1 month ago

could be same issue as in https://github.com/grafana/agent/discussions/2660 and https://github.com/grafana/agent/issues/3300

jkleinlercher commented 1 month ago

After installing loki again with grafanaAgentOperator enabled I can see in the logs of this operator that it also reconciles the GrafanaAgent from mimir. So it is definitily that the GrafanaAgentOperator does reconcile every instance of GrafanaAgent on the cluster. Two possibilities:

level=info ts=2024-07-30T19:33:38.050022324Z controller=grafanaagent controllerGroup=monitoring.grafana.com controllerKind=GrafanaAgent GrafanaAgent=mimir/sx-mimir-meta-monitoring namespace=mimir name=sx-mimir-meta-monitoring reconcileID=776f018d-c415-4cc0-bb2f-5711e1fa450b msg="deleting integrations Deployment" deploy=mimir/sx-mimir-meta-monitoring-integrations-deploy
level=info ts=2024-07-30T19:33:38.050612609Z controller=grafanaagent controllerGroup=monitoring.grafana.com controllerKind=GrafanaAgent GrafanaAgent=mimir/sx-mimir-meta-monitoring namespace=mimir name=sx-mimir-meta-monitoring reconcileID=776f018d-c415-4cc0-bb2f-5711e1fa450b msg="deleting integrations DaemonSet" ds=mimir/sx-mimir-meta-monitoring-integrations-ds
level=info ts=2024-07-30T19:33:38.050684003Z controller=grafanaagent controllerGroup=monitoring.grafana.com controllerKind=GrafanaAgent GrafanaAgent=mimir/sx-mimir-meta-monitoring namespace=mimir name=sx-mimir-meta-monitoring reconcileID=776f018d-c415-4cc0-bb2f-5711e1fa450b msg="Reconcile successful"
jkleinlercher commented 1 month ago

fixed with https://github.com/suxess-it/sx-cnp-oss/commit/1d78269688f70ab37b291c779d8074ede73d8f8f

You cannot have two Grafana Agent Operator on cluster, since they conflict with reconiling Grafana Agent CRs. So I decided to keep Grafana Agent Operator in loki and disable it in mimir