/health endpoint returns 404

Dezarin commented 5 days ago

I tried updating to version v0.17.0 of the stackdriver-exporter container but the pod never achieves a running state, because it fails it's liveness probes. Queries to the /metrics endpoint works as expected, but /health returns 404 errors. I tried looking for an updated helm-chart for version v0.17.0, but it has not been released yet.

foo:~# curl 10.244.1.17:9255/metrics
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
<truncated>

foo:~#  curl 10.244.1.17:9255/health
404 page not found

Containers:
  prometheus-stackdriver-exporter:
    Container ID:  containerd://0e46d42f0432dfd3dcc37acd6f9b78edbcc6ae1c71c907b89d2ea67d55aa4269
    Image:         prometheuscommunity/stackdriver-exporter:v0.17.0
    Image ID:      docker.io/prometheuscommunity/stackdriver-exporter@sha256:ca514180d5f5e4997e78f94ad23a08d7ad81b932485bd2152c98504cb38c1fdb
    Port:          9255/TCP
    Host Port:     0/TCP
    Command:
      stackdriver_exporter
    Args:
      --google.project-id=<REMOVED>
      --monitoring.metrics-interval=5m
      --monitoring.metrics-offset=0s
      --monitoring.metrics-type-prefixes=compute.googleapis.com/instance/cpu
      --stackdriver.backoff-jitter=1s
      --stackdriver.http-timeout=10s
      --stackdriver.max-backoff=5s
      --stackdriver.max-retries=0
      --stackdriver.retry-statuses=503
      --web.listen-address=:9255
      --web.telemetry-path=/metrics
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Thu, 14 Nov 2024 10:55:31 -0700
      Finished:     Thu, 14 Nov 2024 10:56:21 -0700
    Ready:          False
    Restart Count:  5
    Liveness:       http-get http://:http/health delay=30s timeout=10s period=10s #success=1 #failure=3
    Readiness:      http-get http://:http/health delay=10s timeout=10s period=10s #success=1 #failure=3

initharrington commented 1 day ago

I upgraded to v0.17.0 this morning, same issue with health endpoint. Using https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-stackdriver-exporter (v4.6.2 chart) causes it to just crashloop.

Dezarin commented 1 day ago

You can modify the liveness / readiness checks in the chart to point to / instead of /health and the pod will come to a ready state, but the /health endpoint should be restored.

prometheus-community / stackdriver_exporter

/health endpoint returns 404 #386