grafana / helm-charts

Apache License 2.0
1.64k stars 2.27k forks source link

"too many 500 error responses" error created by grafana-sc-datasources sidecar #2032

Open Wyifei opened 1 year ago

Wyifei commented 1 year ago

I installed grafana by kube-prometheus-stack, after installation, I found that grafana-sc-datasources sidecar will genreate below error message:

Error message:

{"time": "2022-11-30T09:15:06.140532+00:00", "level": "ERROR", "msg": "Received unknown exception: HTTPConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/datasources/reload (Caused by ResponseError('too many 500 error responses'))\n"} Traceback (most recent call last): File "/app/.venv/lib/python3.10/site-packages/requests/adapters.py", line 489, in send resp = conn.urlopen( File "/app/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 876, in urlopen return self.urlopen( File "/app/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 876, in urlopen return self.urlopen( File "/app/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 876, in urlopen return self.urlopen( [Previous line repeated 2 more times] File "/app/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 866, in urlopen retries = retries.increment(method, url, response=response, _pool=self) File "/app/.venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/datasources/reload (Caused by ResponseError('too many 500 error responses'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/app/resources.py", line 273, in _watch_resource_loop _watch_resource_iterator(*args) File "/app/resources.py", line 261, in _watch_resource_iterator request(request_url, request_method, enable_5xx, request_payload) File "/app/helpers.py", line 128, in request res = r.post("%s" % url, auth=auth, json=payload, timeout=REQ_TIMEOUT) File "/app/.venv/lib/python3.10/site-packages/requests/sessions.py", line 635, in post return self.request("POST", url, data=data, json=json, kwargs) File "/app/.venv/lib/python3.10/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, send_kwargs) File "/app/.venv/lib/python3.10/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "/app/.venv/lib/python3.10/site-packages/requests/adapters.py", line 556, in send raise RetryError(e, request=request) requests.exceptions.RetryError: HTTPConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/datasources/reload (Caused by ResponseError('too many 500 error responses'))

Grafana pod manifest:

Name: dev-prometheus-grafana-5c97c6b7c8-r6n5s Namespace: monitoring Priority: 0 Node: ip-10-0-6-142.cn-north-1.compute.internal/10.0.6.142 Start Time: Wed, 30 Nov 2022 17:14:22 +0800 Labels: app.kubernetes.io/instance=dev-prometheus app.kubernetes.io/name=grafana pod-template-hash=5c97c6b7c8 Annotations: checksum/config: 20db1a5dc5ece34d046438788e82a493fd3293187a931c2be927429af5e5ed51 checksum/dashboards-json-config: 41e6bd76af8b9618443a1b493fcb675b596262cb3b229bac9955de3257197fdb checksum/sc-dashboard-provider-config: b45bd52a38f0b4dfaa10fc611da2f676ebd7c5f68d393008a2a5764b33b2cba0 checksum/secret: 90fde33c1dfc3be61ffd1ce93019a5687d2ecfd76feb54b6bf13cc532c27292c kubernetes.io/psp: eks.privileged Status: Running IP: 10.0.6.63 IPs: IP: 10.0.6.63 Controlled By: ReplicaSet/dev-prometheus-grafana-5c97c6b7c8 Init Containers: download-dashboards: Container ID: docker://1c39ab8e4023f1fe7b6cfc38a36320fb3d122aa45b46629161ab59553d7f08d7 Image: curlimages/curl:7.85.0 Image ID: docker-pullable://curlimages/curl@sha256:9fab1b73f45e06df9506d947616062d7e8319009257d3a05d970b0de80a41ec5 Port: Host Port: Command: /bin/sh Args: -c mkdir -p /var/lib/grafana/dashboards/default && /bin/sh -x /etc/grafana/download_dashboards.sh State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 30 Nov 2022 17:14:24 +0800 Finished: Wed, 30 Nov 2022 17:14:24 +0800 Ready: True Restart Count: 0 Environment: Mounts: /etc/grafana/download_dashboards.sh from config (rw,path="download_dashboards.sh") /var/lib/grafana from storage (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4vds (ro) Containers: grafana-sc-dashboard: Container ID: docker://afcd47c8ab241ce16db125e897c09550fa3658c892d0e473cf77953d015b6419 Image: 723527083861.dkr.ecr.cn-north-1.amazonaws.com.cn/kiwigrid/k8s-sidecar:1.19.2 Image ID: docker-pullable://723527083861.dkr.ecr.cn-north-1.amazonaws.com.cn/kiwigrid/k8s-sidecar@sha256:ef2a72bc0bc150ffb24cc8ea65ed4e6900c9a88753d9b18d1e1e89c77cc66efc Port: Host Port: State: Running Started: Wed, 30 Nov 2022 17:14:25 +0800 Ready: True Restart Count: 0 Environment: METHOD: WATCH LABEL: dashboard-provider LABEL_VALUE: default FOLDER: /tmp/dashboards RESOURCE: both Mounts: /tmp/dashboards from sc-dashboard-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4vds (ro) grafana-sc-datasources: Container ID: docker://169189bf99f6ff3424a37a3c37c9534a6b0f6911708651aa89e09a163bcc0997 Image: 723527083861.dkr.ecr.cn-north-1.amazonaws.com.cn/kiwigrid/k8s-sidecar:1.19.2 Image ID: docker-pullable://723527083861.dkr.ecr.cn-north-1.amazonaws.com.cn/kiwigrid/k8s-sidecar@sha256:ef2a72bc0bc150ffb24cc8ea65ed4e6900c9a88753d9b18d1e1e89c77cc66efc Port: Host Port: State: Running Started: Wed, 30 Nov 2022 17:14:25 +0800 Ready: True Restart Count: 0 Environment: METHOD: WATCH LABEL: grafana_datasource LABEL_VALUE: 1 FOLDER: /etc/grafana/provisioning/datasources RESOURCE: both REQ_USERNAME: <set to the key 'admin-user' in secret 'dev-prometheus-grafana'> Optional: false REQ_PASSWORD: <set to the key 'admin-password' in secret 'dev-prometheus-grafana'> Optional: false REQ_URL: http://localhost:3000/api/admin/provisioning/datasources/reload REQ_METHOD: POST Mounts: /etc/grafana/provisioning/datasources from sc-datasources-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4vds (ro) grafana: Container ID: docker://8d78bddbca8684ae27bd858ffc22d8f727ef3034a2cca0e18794e8293b03c335 Image: 723527083861.dkr.ecr.cn-north-1.amazonaws.com.cn/grafana/grafana:9.2.4 Image ID: docker-pullable://723527083861.dkr.ecr.cn-north-1.amazonaws.com.cn/grafana/grafana@sha256:a11c6829cdfe7fd791e48ba5b511f3562384361fb4c568ec2d8a5041ac52babe Port: 3000/TCP Host Port: 0/TCP State: Running Started: Wed, 30 Nov 2022 17:14:25 +0800 Ready: True Restart Count: 0 Liveness: http-get http://:3000/api/health delay=60s timeout=30s period=10s #success=1 #failure=10 Readiness: http-get http://:3000/api/health delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: GF_SECURITY_ADMIN_USER: <set to the key 'admin-user' in secret 'dev-prometheus-grafana'> Optional: false GF_SECURITY_ADMIN_PASSWORD: <set to the key 'admin-password' in secret 'dev-prometheus-grafana'> Optional: false GF_PATHS_DATA: /var/lib/grafana/ GF_PATHS_LOGS: /var/log/grafana GF_PATHS_PLUGINS: /var/lib/grafana/plugins GF_PATHS_PROVISIONING: /etc/grafana/provisioning Mounts: /etc/grafana/grafana.ini from config (rw,path="grafana.ini") /etc/grafana/provisioning/dashboards/sc-dashboardproviders.yaml from sc-dashboard-provider (rw,path="provider.yaml") /etc/grafana/provisioning/datasources from sc-datasources-volume (rw) /etc/grafana/provisioning/datasources/datasources.yaml from config (rw,path="datasources.yaml") /tmp/dashboards from sc-dashboard-volume (rw) /var/lib/grafana from storage (rw) /var/lib/grafana/dashboards/default/custom-dashboard.json from dashboards-default (rw,path="custom-dashboard.json") /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4vds (ro)

charlrvd commented 1 year ago

Hi @Wyifei , I just noticed I get spammy errors too that I didn't get before. Looks like I missed a change recently that comes from https://github.com/grafana/helm-charts/pull/1955 and the other related PR. You can just update your values to disable the sidecar if you don't need it or update the label according to the one you set in your datasource resources.

grafana:
  enabled: false
  sidecar:
    datasources:
      label: ""
      labelValue: ""
      enabled: true
      maxLines: 1000

With these new defaults the sidecard is loading all the secrets and configmap from the namespace into the /etc/grafana/provisioning/datasources directly and the script trying to import the datasources just fails miserably on any file in there that is not a proper datasource