solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.02k stars 433 forks source link

Gloo Edge Observability pod is reporting errors when configured with an existing Prometheus and Grafana #7233

Open georgefridrich opened 1 year ago

georgefridrich commented 1 year ago

Gloo Edge Version

1.12.x (latest stable)

Kubernetes Version

1.24.x

Describe the bug

I've followed these instructions for getting the dashboards in GE while not using the GE installed Prometheus and Grafana:

https://github.com/solo-io/engineering-demos/tree/2429e36bc6ea69edd7c1cc18ae3d48b7ee29c437/gloo-edge/observability/external-grafana

My observability pod is giving me the following error (Customer env is also seeing the same errors):

{"level":"error","ts":"2022-09-23T19:21:10.549Z","logger":"observability.v1.event_loop.observability","caller":"syncer/setup_syncer.go:174","msg":"error: event_loop: All attempts fail:\n#1: unable to get list of current snapshots to compare against, skipping generation: Get \"http://grafana.grafana.svc.cluster.local/api/dashboard/snapshots\": dial tcp 10.100.64.206:80: i/o timeout\n#2: unable to get list of current snapshots to compare against, skipping generation: Get \"http://grafana.grafana.svc.cluster.local/api/dashboard/snapshots\"

Steps to reproduce the bug

Install GE with the following Helm values: helm: skipCrds: false values: | create_license_secret: false license_secret_name: license gatewayProxies: gatewayProxy: kind: deployment: replicas: 3 podDisruptionBudget: maxUnavailable: 1 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms:

Expected Behavior

I expect to not see any errors in the Observability pod logs and for the grafana dashboards to show as loaded in the remote Grafana (non GE installed) on the local cluster.

Additional Context

I have this env up and running and can provide additional logging and assistance as required, please email george.fridrich@solo.io or reach out via slack. You guys rock! Let me know how I can help (I used to be a dev so I can help with code if needed).

github-actions[bot] commented 3 weeks ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.