prometheus-community / stackdriver_exporter

Google Stackdriver Prometheus exporter
Apache License 2.0
263 stars 99 forks source link

Aggregate-deltas incompatible with `?collect` parameter in v0.15, and completely broken on `master` #315

Open xairos opened 8 months ago

xairos commented 8 months ago

My latest change (https://github.com/prometheus-community/stackdriver_exporter/commit/bc18b73dab0e6454285d29150a7bcd99578da8d5) looks to have broken the --monitoring.aggregate-deltas feature 😞 My apologies! 🤦🏻


On v0.15.0, newHandler is called once on start, which calls innerHandler once, which in turn creates the InMemoryCounterStore and InMemoryHistogramStore for delta metrics aggregation. innerHandler then returns an http.Handler for ServeHTTP to call upon each request to /metrics.

The exception to the above (in v0.15.0) is when call-time metric filters are supplied via ?collect=. In that case innerHandler is invoked on every request, and delta metrics aggregation does not work.

Unfortunately, in my change, innerHandler is now invoked on every call to /metrics (similar to how on v0.15.0, it gets invoked on every call if there are call-time metric filters).

Steps to reproduce

On v0.15.0, delta aggregation works

docker run prometheuscommunity/stackdriver-exporter-linux-amd64:v0.15.0 \
  --monitoring.aggregate-deltas \
  --monitoring.aggregate-deltas-ttl=30m \
  --google.project-id=my-project \
  --monitoring.metrics-type-prefixes="cloudsql.googleapis.com/database/cpu/usage_time" # DELTA type

# counter is monotonically increasing
❯ curl -s http://localhost:9255/metrics | grep '^stackdriver_.*_cpu_usage_time'
stackdriver_cloudsql_database_cloudsql_googleapis_com_database_cpu_usage_time{database_id="my-project:mydb-master-6452f9dc",project_id="my-project",region="us-central",unit="s{CPU}"} 47.54024539538659 1710348060000

❯ curl -s http://localhost:9255/metrics | grep '^stackdriver_.*_cpu_usage_time'
stackdriver_cloudsql_database_cloudsql_googleapis_com_database_cpu_usage_time{database_id="my-project:mydb-master-6452f9dc",project_id="my-project",region="us-central",unit="s{CPU}"} 72.79858832078753 1710348060000

❯ curl -s http://localhost:9255/metrics | grep '^stackdriver_.*_cpu_usage_time'
stackdriver_cloudsql_database_cloudsql_googleapis_com_database_cpu_usage_time{database_id="my-project:mydb-master-6452f9dc",project_id="my-project",region="us-central",unit="s{CPU}"} 97.56146611412987 1710348240000

But not if ?collect= is used.

# counter has reset
❯ curl -s 'http://localhost:9255/metrics?collect=cloudsql.googleapis.com/database/cpu/usage_time' | grep '^stackdriver_.*_cpu_usage_time'
stackdriver_cloudsql_database_cloudsql_googleapis_com_database_cpu_usage_time{database_id="my-project:mydb-master-6452f9dc",project_id="my-project",region="us-central",unit="s{CPU}"} 25.679574115085416 1710348420000

On master, delta aggregation doesn't work at all

docker run prometheuscommunity/stackdriver-exporter-linux-amd64:v0.15.0 \
  --monitoring.aggregate-deltas \
  --monitoring.aggregate-deltas-ttl=30m \
  --google.project-id=my-project \
  --monitoring.metrics-type-prefixes="cloudsql.googleapis.com/database/cpu/usage_time" # DELTA type

# counter fluctuates as a gauge, even though --monitoring.aggregate-deltas is enabled
❯ curl -s 'http://localhost:9255/metrics' | grep '^stackdriver_.*_cpu_usage_time'
stackdriver_cloudsql_database_cloudsql_googleapis_com_database_cpu_usage_time{database_id="my-project:mydb-master-6452f9dc",project_id="my-project",region="us-central",unit="s{CPU}"} 25.63321346603334 1710348600000

❯ curl -s 'http://localhost:9255/metrics' | grep '^stackdriver_.*_cpu_usage_time'
stackdriver_cloudsql_database_cloudsql_googleapis_com_database_cpu_usage_time{database_id="my-project:mydb-master-6452f9dc",project_id="my-project",region="us-central",unit="s{CPU}"} 26.545367503887974 1710348660000

❯ curl -s 'http://localhost:9255/metrics' | grep '^stackdriver_.*_cpu_usage_time'
stackdriver_cloudsql_database_cloudsql_googleapis_com_database_cpu_usage_time{database_id="my-project:mydb-master-6452f9dc",project_id="my-project",region="us-central",unit="s{CPU}"} 23.90490857156692 1710348720000