m-lab / prometheus-support

Prometheus configuration for M-Lab running on GKE
Apache License 2.0
19 stars 11 forks source link

Removes ContainerLogsMissingInStackdriver alert #1007

Closed nkinkade closed 1 year ago

nkinkade commented 1 year ago

We recently stopped pushing experiment logs to GCP:

https://github.com/m-lab/k8s-support/pull/849

Vector should still be pushing logs from some containers like node-exporter, cAdvisor, flannel, etc. However, it would appear that these container produce very few logs and trigger this alert.

Vector is also still pushing kernel logs over above a certain level. Again, it seems that a good chunk of machines do not push kernel logs frequently enough to make a reliable alert.

All this said, we have observed that Vector sometimes gets into an error state with regard to GCP, which has previously only been visible because of the alert being removed by this commit. We need to figure out a way to alert when Vector cannot push logs to Stackdriver, but does not exit or produce anything but a log message. Maybe Vector can push its own logs to Stackdriver, and we can search for error messages in a custom log-based metric?


This change is Reviewable