m-lab / prometheus-support

Prometheus configuration for M-Lab running on GKE
Apache License 2.0
19 stars 11 forks source link

Removes the flag --monitoring.metrics-interval #965

Closed nkinkade closed 1 year ago

nkinkade commented 1 year ago

For some reason, with metrics-interval=1m, stackdriver_exporter was not ingesting any metrics for container logs. The default for this flag is 5m, and removing the flag resolved the issue of alerts firing in staging about container logs missing in Stackdriver for a bunch of nodes.

This behavior of stackdriver_exporter has apparently impacted others. And based on what some have said in one particular issue, setting the flag to a higher value shouldn't impact the results:

You should be able to set the interval to something like 6 hours and it won't break the other services because it only uses the latest metrics, so you'll just be pulling more data than necessary but it shouldn't break the metrics (as far as I know). I'd try that and see, you'll just need to note somewhere that the bigquery ones are from 6 hours ago and they're not current, though they'll look like their current from Prometheus's point of view.


This change is Reviewable