Closed squeed closed 4 years ago
Can you explain what you ultimately want to use this value for?
Sure. The use-case is kube-proxy.
I have a stochastic stream of events. The rate can realistically be between 10/sec and 1/day. There is a process that applies these changes to the node's iptables. This can also stochastically or persistently fail, thanks to Kernel Fun Times. (For various reasons, a failure to apply is not retried. And, yes, that should be fixed, but it's not critical to this case.)
I'd like to alert on two scenarios:
Iptables is persistently failing, In other words, the last_request_time is much greater than last_applied_time. In this case, I don't care about 0 values, because the inequality works out in my favor.
For whatever reason, the loop generating the event stream has broken down. I want to find nodes whose last_request_time is much older than the rest of the cluster. Freshly restarted kube-proxy processes make this awkward, since they report a 0. In this case, I can filter out 0 values on the query side, since a Kubernetes cluster up since 1970 would be... surprising.
So, in my particular case, I can work around this by filtering on 0 in my queries, but that's awkward. It seems like we shouldn't be exposing incorrect data in the first place.
/cc @SuperQ - we were chatting about this elsewhere.
So, in my particular case, I can work around this by filtering on 0 in my queries, but that's awkward.
What you would propose would leave you blind if there was a persistent failure since the start time, so I'm not sure you're gaining anything here - you need some PromQL logic one way or the other.
What you would propose would leave you blind if there was a persistent failure since the start time, so I'm not sure you're gaining anything here - you need some PromQL logic one way or the other.
Indeed, that is not queryable regardless of metric value. We can only solve it with a separate signal for "I have established a connection and populated my caches", which is the container going Ready in the world of Kubernetes.
Whatever the outcome of the discussion if this is sane at all will be, it is very easy to make this library act as @squeed requested:
Gauge
as a GaugeVec
with no labels:var gge = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "just_for_squeed",
Help: "Perhaps you shouldn't do this…",
},
[]string{},
)
This gauge will not show up in your exposition until you call With
or WithLabelValues
. If you call those at the time you Set
the gauge, it will show up from that point on:
gge.With(nil).Set(42)
// Or, if you like that more:
gge.WithLabelValues().Set(42)
If your gauge is already a vector anyway, then it's even more straight forward. It's essentially the inverse of the infamous CounterVec
problem, where counters only spring into existence upon their first increment.
I assume @squeed can now do as he pleases. Please follow up here if I'm wrong. I'll close this issue for now.
Indeed, the vector trick is perfect, thanks!
Sorry if this is noise, and feel free to close if so.
TL;DR: Add a
GaugeOpt
to hide a metric until first observation.I'm trying to follow what I understand to be best practices around unknown / unobserved Gauge values. Right now, an unobserved Gauge has a value of 0, which is a definitively incorrect value in the case of timestamps. AIUI, the correct behavior in this case is to simply not expose the metric until the process has observed the value.
In my particular case, it's a timestamp of the last time a request was received. (I don't care about rates, only staleness). If the process restarts, it can be hours or even days until the next request comes in, so that means a long time reading a incorrect value. This isn't just a blip of bad data on startup.
In this case I can filter out 0 on the query side, but I can think of cases where 0 is a legitimate value, so using 0 to mean
null
would be incorrect.It doesn't look like there's an ergonomic way to express this in client_golang without writing a custom Collector. If this is indeed a best practice around timestamp values, the feature request is a GaugeOpt that hides unobserved metrics.