Recently the help string for some metrics changed, one such example metric is stackdriver_gce_instance_compute_googleapis_com_instance_uptime_total. This led to the exporter failing and yielding messages like the below (newlines added for readability)
34 error(s) occurred:
* [from Gatherer #2] collected metric stackdriver_gce_instance_compute_googleapis_com_instance_uptime_total label:{name:"instance_id" value:"REDACTED"} label:{name:"instance_name" value:"REDACTED"} label:{name:"project_id" value:"REDACTED"} label:{name:"unit" value:"s"} label:{name:"zone" value:"REDACTED"} gauge:{value:11520} timestamp_ms:1724192100000
has help
"Elapsed time since the VM was started, in seconds. After sampling, data is not visible for up to 120 seconds. When VM is Stopped (https://cloud.google.com/compute/docs/instances/stop-start-instance#stop-vm-google-cloud), the time is not calculated. On starting the VM again, the timer will reset to 0 for that VM."
but should have
"Elapsed time since the VM was started, in seconds."
... and many more of the same format
I am not sure what the correct solution is here but I do know it is a really annoying issue since it means we essentially have to ignore groups of metrics at least until help string versions converge / age out. If there is a suggestion on how to fix this I can try to contribute as well, I just wasn't even sure where to start with this one.
Recently the help string for some metrics changed, one such example metric is
stackdriver_gce_instance_compute_googleapis_com_instance_uptime_total
. This led to the exporter failing and yielding messages like the below (newlines added for readability)If this is something stackdriver is messing up I can open a Google support case instead, but this project or Google support seemed like the right place to handle it since the prometheus client does not allow changing these help strings (https://github.com/prometheus/client_golang/blob/b5361fed217651b4d855961b47481209ac0745a0/prometheus/registry.go#L640 causes the underlying failure)
I am not sure what the correct solution is here but I do know it is a really annoying issue since it means we essentially have to ignore groups of metrics at least until help string versions converge / age out. If there is a suggestion on how to fix this I can try to contribute as well, I just wasn't even sure where to start with this one.