pulumi / pulumi-kubernetes-operator

A Kubernetes Operator that automates the deployment of Pulumi Stacks
Apache License 2.0
221 stars 55 forks source link

Clarify type and meaning of stacks_* metrics #402

Closed squaremo closed 1 year ago

squaremo commented 1 year ago

The stacks_failing metric is created as a GaugeVec in the Go code, which represents a set of time series distinguished by labels (in this case, "namespace" and "name"). But each of these time series are of type gauge, so the documentation is misleading in referring to them as gaugevec (which is not a kind of metric).

I've simplified the verbiage a little, in passing.

There was also a fault in the implementation of the stacks_failing count, which I've corrected. From the commit message:

The stacks_failed metric is a set of gauges, each labelled with the namespace and name of a Stack object. The controller sets a gauge to 1 when its Stack object is given a state of "failed", and 0 for "succeeded". A query aggregating over the labels will get the count of failed stacks.

However: once a Stack is deleted, the gauge remains with the last value -- and if it was failing, it will still be included in the count. So, this commit resets the gauge to 0 when a Stack is deleted (if it had a state at all).

Addresses #399.