prometheus / client_golang

Prometheus instrumentation library for Go applications
https://pkg.go.dev/github.com/prometheus/client_golang
Apache License 2.0
5.34k stars 1.17k forks source link

Consider not having channels in the Collector interface #228

Open beorn7 opened 8 years ago

beorn7 commented 8 years ago

This is the current Collector interface:

type Collector interface {
    Describe(chan<- *Desc)
    Collect(chan<- Metric)
}

The main reason for having channels there was that back in 2013, channels were cool, so I was primarily looking for neat ways to make use of them rather than avoiding them in APIs. While the current interface has some merits, the pattern used here is very unconventional, and now that breaking changes will happen anyway, we have the chance of changing the interface. In the following, I'll list pros and cons to come to an informed decision.

The new interface would look like:

type Collector interface {
    Describe() []*Desc
    Collect() []Metric
}

Reasons for changing the interface

func (c *C) Collect(m chan<- prometheus.Metric) {
    m <- c.constMetric
    c.subCollector.Collect(m)
    c.anotherSubCollector.Collect(m)
}

Sequential collection with new interface

func (c *C) Collect() []prometheus.Metric {
    var m []prometheus.Metric
    m = append(m, c.constMetric)
    m = append(m, c.subCollector.Collect()...)
    m = append(m, c.anotherSubCollector.Collect(m)...)
    return m
}

or more compact (but arguably also more confusing):

func (c *C) Collect() []prometheus.Metric {
    m := []promtheus.Metric{c.constMetric}
    m = append(m, c.subCollector.Collect()...)
    return append(m, c.anotherSubCollector.Collect(m)...)
}

Concurrent collection with old interface

func (c *C) Collect(m chan<- prometheus.Metric) {
    var wg sync.WaitGroup
    wg.Add(len(c.colls))
    for _, coll := range c.colls {
        go func(coll prometheus.Collector) {
            coll.Collect(m) // Protobuf encoding already starting here.
            wg.Done()
        }(coll)
    }
    wg.Wait()
}

Concurrent collection with new interface

func (c *ApacheCollector) Collect() []prometheus.Metric {
    ch := make(chan []prometheus.Metric)
    for _, coll := range c.colls {
        go func(coll prometheus.Collector) {
            ch <- coll.Collect()
        }(coll)
    }
    var m []prometheus.Metric
    for _ = range c.colls {
        m = append(m, <-ch...)
    }
    return m // Only now can the registry start encoding.
}
fabxc commented 8 years ago

Cool, I'm very much in favor of this.

Performance/allocation penalty of the slice interface is not overly relevant as registration and collection happens relatively rare. (Exception would be high-frequency scrape scenarios.)

Haven't measured it, but I am relatively certain that channels are more expensive anyway.

Only "power users" get in touch with the Collector interface, which limits the damage of the unusual API.

Not sure about that. It's very common in exporters and can also be in regular code, where it's the easier/cleaner option than always keeping a regular variable in sync with an instrumentation gauge or counter.

func (c *C) Collect(m chan<- prometheus.Metric) {
    m <- c.constMetric
    c.subCollector.Collect(m)
    c.anotherSubCollector.Collect(m)
}

I always found this relatively hard to read/understand without proper context of the client library. Realistically, most devs don't want instrumentation be an understanding overhead.

The pain point here, especially for concurrent collections, is that the user has to care about bottom-up propagation. May I suggest an alternative interface:

type Sampler interface {
    Sample(...Metric)
    SampleFrom(...Collector)
}

type Registrator interface {
    Register(...*Desc)
    RegisterFrom(...Collector)
}

type Collector interface {
    Describe(Registrator)
    Collect(Sampler)
}

We are perfectly fine by the functionality being implemented using channels. We just want to keep this as an implementation detail. So passing in a receiving object as an argument still seems fine to me and keeps things top-down. Naming is a bit tricky of course as we definitely have a verb overlap across various concerns.

Sequential collection:

func (c *C) Collect(s Sampler) {
    s.Sample(c.constMetric1, c.constMetric2)
    s.SampleFrom(c.subCollector, c.anotherSubCollector)
}

Concurrent collection:

func (c *C) Collect(s Sampler) {
    var wg sync.WaitGroup
    wg.Add(len(c.colls))
    for _, coll := range c.colls {
        go func(coll prometheus.Collector) {
            s.SampleFrom(coll)
            // equivalent: coll.Collect(s)
            wg.Done()
        }(coll)
    }
    wg.Wait()
}

As concurrent collection may be common in exporters, we can actually make that a bit more convenient to users:

type Sampler interface {
    Sample(...Metric)
    SampleFrom(...Collector)
    SampleAsyncFrom(...Collector)
}

Concurrent collection:

func (c *C) Collect(s Sampler) {
    s.SampleAsyncFrom(c.colls...)
}

The async case of couse exposes no control for the degree of concurrency. However, I hardly expect this to ever be relevant and it can still be solved via:

func (c *C) Collect(s Sampler) {
    s.SampleAsyncFrom(c.colls[:10]...)
    s.SampleAsyncFrom(c.colls[10:]...)
}

Still, naming is tricky, we could also go with passing anonymous functions. There we would want to stick with a single function though and the *From methods would have to be dropped.

beorn7 commented 8 years ago

Thanks, @fabxc, for the inspiring input. Also, paging @peterbourgon as we discussed this over breakfast.

grobie commented 8 years ago

Thanks for starting this discussion.

IMHO, we should have very good reasons to change the interface. While the current channel based interface might not be the most common one, changing it will likely cause a substantial workload to many developers and exporter maintainers (wave).

I'd also vote to try keeping the interface small and to not expose too many different functions (especially with additional *From variants). That mostly just echos @fabxc's "naming is tricky" statement though.

beorn7 commented 8 years ago

@grobie :

IMHO, we should have very good reasons to change the interface. While the current channel based interface might not be the most common one, changing it will likely cause a substantial workload to many developers and exporter maintainers (wave).

Exactly. I wanted to put something like that in my list of "pro stay-as-it". I made it clearer now (edited the original post).

I'd also vote to try keeping the interface small and to not expose too many different functions (especially with additional *From variants). That mostly just echos @fabxc's "naming is tricky" statement though.

Yeah, especially if we are doing this to avoid friction because of weird or unidiomatic interfaces. Making interfaces as lean as possible is pretty much the golden rule by now. Also, we are investing a lot of effort to reduce the number of top-level identifiers in the client library. Introducing more interfaces there needs a good return in investment.

@fabxc :

Haven't measured it, but I am relatively certain that channels are more expensive anyway.

Yes, we should measure that for an informed decision. I'm more concerned about allocation churn, though, and the channel allows "streaming" collection to avoid that. I'll put together some microbenchmarks once I find the time.

Only "power users" get in touch with the Collector interface, which limits the damage of the unusual API.

Not sure about that. It's very common in exporters and can also be in regular code, where it's the easier/cleaner option than always keeping a regular variable in sync with an instrumentation gauge or counter.

There are also the GaugeFunc and CounterFunc collectors to collect single values by calling a function. Implementing the Collector interface is usually only required if there is one expensive step to retrieve a multitude of metrics.

In general, I would broadly estimate there are 100 times more programs using just the stock metrics including GaugeFunc and CounterFunc than those implementing custom Collectors. And among the latter, there are probably 100 times more programs, where the performance concerns we are pondering here are not really important. It should still be possible to write well performing exporters, but it doesn't have to be elegant if it would make things harder for the many other programs using the library.

About the "glue interfaces": This seems to be the "mathematically correct solution". But it includes two more interfaces with two methods each (speaking of the number of top level identifiers in the package...). And while my familiarity with the current form will certainly cloud my judgement, I would still think the cognitive overhead of understanding the meaning of two two-method receiver interfaces is larger than with the receiving channel. Perhaps we can run this by some test users. And also experts like @peterbourgon .

brian-brazil commented 8 years ago

Overall I wouldn't worry too much about the performance at scrape time (including custom collectors) as that only happens typically every tens of seconds.

beorn7 commented 5 years ago

Note to self: Consider passing a context to collectors so that expensive collections can be canceled.

brian-brazil commented 5 years ago

Something to watch out for is users using the context to pass things down from http etc. beyond a timeout.

beorn7 commented 5 years ago

Could you explain what you mean? Is that something we would like to discourage? If so, why?

brian-brazil commented 5 years ago

Every so often there are users asking how to get the IP of the scraper, URL parameters etc. in a collector. To me that's a layering violation, and often also a misunderstanding of how our scraping model works.

On Thu 21 Mar 2019, 16:40 Björn Rabenstein, notifications@github.com wrote:

Could you explain what you mean? Is that something we would like to discourage? If so, why?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prometheus/client_golang/issues/228#issuecomment-475308259, or mute the thread https://github.com/notifications/unsubscribe-auth/AGyTdsnCmowCacG9iJdoPBid9e5WLTFpks5vY7YPgaJpZM4JngSh .

bboreham commented 5 years ago

On the other hand, when using distributed tracing, this is exactly what I want to do - pass in a trace ID which will be picked up at lower layers.

beorn7 commented 5 years ago

Contexts are flexible and powerful and as such allow users to abuse them for all kind of weirdness. It wouldn't stop me from using them if they are the best way to enably beneficial use cases.

I'm not sure what you mean by "watch out for". We can document best practices, but as contexts are done in Go, we can hardly babysit users and enforce how they use them.

jacksontj commented 5 years ago

If this is moved to a slice of metrics instead of a channel, do we have any metrics on the memory-usage impact? I have a few custom exporters that spit out tons of metrics, so I have some concerns about the memory impact of moving from channels to slices.

beorn7 commented 5 years ago

The idea of streaming metric exposition was in fact a reason why we chose to have a channel in the interface. However, we never implemented streaming exposition. We actually made it practically impossible by now as the client library is firmly designed around the premise of never creating an invalid exposition. For that, it has to materialize the whole set of metrics anyway to then check it for consistency. Cf. the Gatherer interface, which already has a slice.

To answer your question: The memory usage impact would be zero (if not negative) if we just changed it right now, as the first thing that the registry does is to put everything into slices anyway.

One of the insights over the years was that it is really hard to cater for all possible use cases without making the library unfeasibly complex. Thus, we try to get most of the use cases covered. The few uncovered cases have to revert to custom implementations. A good example is kube-state-metrics, which exposes a gigantic amount of mostly static metrics. The bottleneck when using this client library was, BTW, not memory consumption but more the effort of encoding the internal representation of the metrics into the Prometheus text format. Thus, even a streaming exposition would not have helped there.