hashicorp / go-metrics

A Golang library for exporting performance and runtime metrics to external metrics systems (i.e. statsite, statsd)
MIT License
1.46k stars 177 forks source link

actually describe PrometheusSink descriptors #129

Closed acpana closed 3 years ago

acpana commented 3 years ago

Overview

Stumbled across this as I was trying to instantiate two TestAgents in consul. See https://github.com/hashicorp/consul/issues/11273 for more context around that papercut.

It all comes down to registering more than on PrometheusSink on a registerer.

Today, that's not possible since we don't actually implement .Describe(). send the same descriptor regardless of the sink being used.

This approach runs over all the gauges, summaries and counters and sends .Desc() to the chan.

It also adds a "dummy" value to register "empty" sinks but it relies on a new field: PrometheusSink.name.

This approach introduces a new field, PrometheusSink.name, and creates a descriptor around it in order to avoid "collisions" with other PrometheusSinks.

Testing

Related Issues

acpana commented 3 years ago

We will still have some ephemeral metrics that are not registered when the process starts. When those metrics are emitted our Describe method will start returning different descriptions. I'm not sure what problem that will cause, but it seems like the implementation in this PR might violate that requirement.

Yes, that's spot on -- thanks for the call out!

So funcs like:

...
func (p *PrometheusSink) SetGaugeWithLabels(parts []string, val float32, labels []metrics.Label) {
...
func (p *PrometheusSink) AddSampleWithLabels(parts []string, val float32, labels []metrics.Label) {
...
func (p *PrometheusSink) IncrCounterWithLabels(parts []string, val float32, labels []metrics.Label) {
...

store metrics outside of the respective init* counterparts. If we call .Describe() before any of the funcs above are called (and we do call them over in consul land), the set of the return values will be different than when we call .Describe() after those funcs were called.

I'm interpreting this behavior in two ways:

or


Pragmatic Hat

As far as I can see, we only really care about .Describe() when registering/ unregistering a .Collector, although others may relay more on the func, hence the need to make a change that doesn't break existing behavior.

Maybe then we could go with @markan's suggestion and just rely on the Name:

func (p *PrometheusSink) Describe(c chan<- *prometheus.Desc) {
    prometheus.NewGauge(prometheus.GaugeOpts{Name: p.name, Help: p.name}).Describe(c)
}

or something else that uniquely identifies the Collector and will not change as we continue to store metrics.

banks commented 3 years ago

My read of the Prometheus docs aligns with Daniels. It would seem their expectations around Describe are that different collectors are collecting different sets of metrics and not just the same metrics but for a different instance of a sub-process.

Does the alternate solution posed elsewhere of having a separate Registerer per sub-process work better?

The Registry docs say:

    // Register registers a new Collector to be included in metrics
    // collection. It returns an error if the descriptors provided by the
    // Collector are invalid or if they — in combination with descriptors of
    // already registered Collectors — do not fulfil the consistency and
    // uniqueness criteria described in the documentation of metric.Desc.

Which to me implies that the library expects all collectors within one registry to be "consistent" that is not emit the same metric from multiple places, while multiple custom Registerer implementations may be used in cases where there is a need for multiple sets of collectors emitting equivalent metrics which seems to be our case when we have multiple agents running in test processes.

Since the Registerer can already be passed in to the Prometheus sink , I think it's already possible to solve this in the calling code. If the boilerplate for doing that is significant and we think it's a pattern other users of this library are likely to need too, it may be worth adding a helper here that makes creating a new instance of the sink with a custom registerer simpler?

acpana commented 3 years ago

hey @dnephin @banks @markan thanks all for the engagement here.

updated the PR to only use the PrometheusSink.Name descriptor to have Describe() be idempotent. Also added a quick note to summarize our chats here in code to make it easier for the next person!