m-lab / prometheus-support

Prometheus configuration for M-Lab running on GKE
Apache License 2.0
19 stars 11 forks source link

scraper-sync alerts #126

Open stephen-soltesz opened 6 years ago

stephen-soltesz commented 6 years ago

Now that delete_logs_safely writes node exporter metrics for the "max raw mtime archived" as seen from scraper-sync we can create end-to-end alerts on the correct operation of scraper-sync and delete logs safely.

Something like:

(time() - scraper_maxrawfiletimearchived)
    AND ON(machine, experiment, rsync_module)
(time() - delete_logs_safely)
stephen-soltesz commented 6 years ago

Alerts on absent metrics.

absent(scraper_maxrawfiletimearchived)

This would have caught the scraper-sync crashes.