m-lab / prometheus-support

Prometheus configuration for M-Lab running on GKE
Apache License 2.0
19 stars 11 forks source link

Add bqx query and alert for missing NDT/scamper1 data #1046

Closed stephen-soltesz closed 1 month ago

stephen-soltesz commented 1 month ago

On 2024-07-22, we discovered that MIG deployments were failing to start scamper1 traces from traceroute-caller due to the unknown MIG IP address. We did not detect this because we have no monitoring tracking whether sidecar data is collected for a primary measurement service. By adding monitoring, this type of error will not go unnoticed again. And this creates a framework for adding additional sidecar types if desired.

See: https://prometheus.mlab-sandbox.measurementlab.net/graph?g0.expr=bq_sidecar_scamper1_matching%20%2F%20bq_sidecar_ndt_total%20%3C%200.90&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=1h


This change is Reviewable

stephen-soltesz commented 1 month ago

FYI: @nkinkade @robertodauria