I had recently experienced #103 and #166 in production and it took quite some time to recognize there was a problem with stackdriver_exporter because nothing was logged out to indiciate problems gathering metrics. From my perspective, the pod was healthy and online and I could curl /metrics to get results. Grafana Agent however was getting errors when scraping, specifically errors like so:
[from Gatherer #2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name"
value:"REDACTED_FOR_SECURITY"} label:{name:"device_type" value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name" value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"} label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"}
counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values
To help identify the root cause I've added the ability to opt into logging out errors that come from the handler. Specifically, I've created the struct customPromErrorLogger that implements the promhttp.http.Logger interface. There is a new flag: monitoring.enable-promhttp-custom-logger which if it is set to true, then we create an instance of customPromErrorLogger and use it as the value for ErrorLogger in promhttp.Handler{}. Otherwise, stackdriver_exporter works as it did before and does not log out errors collectoing metrics.
I had recently experienced #103 and #166 in production and it took quite some time to recognize there was a problem with
stackdriver_exporter
because nothing was logged out to indiciate problems gathering metrics. From my perspective, the pod was healthy and online and I could curl/metrics
to get results. Grafana Agent however was getting errors when scraping, specifically errors like so:To help identify the root cause I've added the ability to opt into logging out errors that come from the handler. Specifically, I've created the struct
customPromErrorLogger
that implements thepromhttp.http.Logger
interface. There is a new flag:monitoring.enable-promhttp-custom-logger
which if it is set to true, then we create an instance ofcustomPromErrorLogger
and use it as the value for ErrorLogger inpromhttp.Handler{}
. Otherwise,stackdriver_exporter
works as it did before and does not log out errors collectoing metrics.