swift-server / swift-prometheus

Prometheus client library for Swift
https://swiftpackageindex.com/swift-server/swift-prometheus
Apache License 2.0
145 stars 32 forks source link

Do not recreate a new counter if we already have one. #13

Closed Yasumoto closed 5 years ago

Yasumoto commented 5 years ago

Checklist

Motivation and Context

We want to make sure we're reusing metrics, not losing ones we already create.

Description

I discovered when running a service that I'm getting multiple invocations of the same metric appearing when I collect():

# TYPE invocation_register counter
invocation_register 0
invocation_register{tool="simple-curl"} 1
# TYPE invocation_register counter
invocation_register 0
invocation_register{tool="simple-curl"} 1
# TYPE invocation_register counter
invocation_register 0
invocation_register{tool="simple-curl"} 1

I'm calling createCounter on each request based on the good advice I got here from @ktoso.

If this looks roughly right to y'all, I can update the other types to follow this pattern, though definitely open to feedback 🎉

MrLotU commented 5 years ago

This is by design. It's different between swift-metrics and this library, but the idea from the SwiftPrometheus standpoint is that you create a counter once, than increment that during your application lifecycle, passing in different labels as required.

Might have to reconsider this though, seeing how swift-metrics takes a different approach.

CC @tomerd @ktoso

Yasumoto commented 5 years ago

Yah, totally see now that advice only applied to SwiftMetrics, and implementations can decide how to handle it.

I took a look at the golang Prometheus client library, and looks like it'll throw an AlreadyRegisteredError.

package main

import (
    "flag"
    "log"
    "net/http"
    "time"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var addr = flag.String("listen-address", ":9001", "The address to listen on for HTTP requests.")

var (
    myCounter = promauto.NewCounter(prometheus.CounterOpts{
        Name: "my_counter",
        Help: "The total number of processed events",
    })
)

func main() {
    flag.Parse()
    go func() {
        for {
            myCounter.Inc()
            time.Sleep(1 * time.Second)
        }
    }()
    myCounterTwo := promauto.NewCounter(prometheus.CounterOpts{
        Name: "my_counter",
        Help: "The total number of processed events",
    })
    go func() {
        for {
            myCounterTwo.Inc()
            time.Sleep(1 * time.Second)
        }
    }()
    http.Handle("/metrics", promhttp.Handler())
    log.Fatal(http.ListenAndServe(*addr, nil))
}
jmsmith@SFO-M-JMSMITH02 ~/w/s/s/loadtest (jmsmith-prom-test) [1]> go run ./prom_example.go
panic: duplicate metrics collector registration attempted

goroutine 1 [running]:
github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0xc0000ca320, 0xc0000201c0, 0x1, 0x1)
    /Users/jmsmith/go/pkg/mod/github.com/prometheus/client_golang@v1.1.0/prometheus/registry.go:399 +0xad
github.com/prometheus/client_golang/prometheus.MustRegister(...)
    /Users/jmsmith/go/pkg/mod/github.com/prometheus/client_golang@v1.1.0/prometheus/registry.go:176
github.com/prometheus/client_golang/prometheus/promauto.NewCounter(0x0, 0x0, 0x0, 0x0, 0x1483caf, 0xa, 0x148e8a6, 0x24, 0x0, 0xc0000a0058, ...)
    /Users/jmsmith/go/pkg/mod/github.com/prometheus/client_golang@v1.1.0/prometheus/promauto/auto.go:136 +0x10a
main.main()
    /Users/jmsmith/workspace/slack-github.com/slack/loadtest/prom_example.go:31 +0xd2
exit status 2

But the docs are pretty interesting, since they do suggest you can catch the error and use the existing collector. Testing out the example, it does indeed give you back a reference to the first counter. Note that promauto does not give you a chance to handle that error.

package main

import (
    "flag"
    "log"
    "net/http"
    "time"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var addr = flag.String("listen-address", ":9001", "The address to listen on for HTTP requests.")

var (
    myCounter = promauto.NewCounter(prometheus.CounterOpts{
        Name: "my_counter",
        Help: "The total number of processed events",
    })
)

func main() {
    flag.Parse()
    go func() {
        for {
            myCounter.Inc()
            time.Sleep(1 * time.Second)
        }
    }()
    myCounterTwo := prometheus.NewCounter(prometheus.CounterOpts{
        Name: "my_counter",
        Help: "The total number of processed events",
    })
    if err := prometheus.Register(myCounterTwo); err != nil {
        if are, ok := err.(prometheus.AlreadyRegisteredError); ok {
            // A counter for that metric has been registered before.
            // Use the old counter from now on.
            myCounterTwo = are.ExistingCollector.(prometheus.Counter)
        } else {
            // Something else went wrong!
            panic(err)
        }
    }
    go func() {
        for {
            myCounterTwo.Inc()
            time.Sleep(1 * time.Second)
        }
    }()
    http.Handle("/metrics", promhttp.Handler())
    log.Fatal(http.ListenAndServe(*addr, nil))
}

I think the context at the top of promauto is super enlightening.

So based on this spelunking on what I'd consider the "reference" implementation, I think we should throw an Exception either way. However, that leaves me with two questions:

  1. Do we want to fully panic & bail out of execution?
  2. Should we create some kind of "auto" registration functions that do the panic, while the main ones just throw an error which pulls out the existing metric?
ktoso commented 5 years ago

To clarify: Why are you not using the SwiftMetrics API to call through to SwiftPrometheus in your app? It would do the right thing, as it's implemented as:

    public func makeCounter(label: String, dimensions: [(String, String)]) -> CounterHandler {
        let createHandler = { (counter: PromCounter) -> CounterHandler in
            return MetricsCounterHandler(counter: counter, dimensions: dimensions)
        }
        // vvvvvvv the "right" thing / "semantics you expected"
        if let counter: PromCounter<Int64, DimensionLabels> = self.getMetricInstance(with: label, andType: .counter) {
            return createHandler(counter)
        }
        // ^^^^^^^ 

        return createHandler(self.createCounter(forType: Int64.self, named: label, withLabelType: DimensionLabels.self))
    }

Why not use the metrics API as a fascade there as it was intended?

This library IS an implementation of the metrics SPI, and thus this works:

let c1 = Counter(label: "one")
c1.increment()
let c2 = Counter(label: "one")
c2.increment()

why not use it, rather than bind yourself into the prometheus API directly? (Of course with binding the prometheus as backend to the metrics API).


Do we want to fully panic & bail out of execution?

No, I don't think so. One cannot compare Go's panic to fatal erroring in Swift. Panics are recoverable, Swift faults are not.

Throwing an error sounds good if the same metrics is attempted to be created. Though I'm actually surprised a bit -- I thought the Go impls semantics was only to bail out of the registered type did not match, not a "re-creation of correct type" 🤔

Should we create some kind of "auto" registration functions that do the panic, while the main ones just throw an error which pulls out the existing metric?

By "auto" you mean "return me the same instance if it existed already"? That's what the SwiftMetrics API would do; What am I missing?

// nitpick, let's not use the word panic, mixes up things a bit IMHO... Swift does not do panics, Go does, and they're recoverable.

Yasumoto commented 5 years ago

I think there are valid usecases in an environment where Prometheus is such a standard that using this package directly is fine and has some benefits. For instance, using createCounter means you can specify help text for the metric that'll show up in Prometheus. Totally agreed that in general even in that environment using SwiftMetrics will make an eventual migration much easier.

That said, the goal for this issue is to track down the expected behavior when using SwiftPrometheus itself. I like the `SwiftMetrics implementation, but I can see why in the golang implementation they wanted to explicitly let folks know in case they're clobbering a metric with another library.

Any opposition to throwing an error that also includes a reference to the existing metric? That way callers can surface the collision just in case, but have access to use it if that's acceptable.


No, I don't think so. One cannot compare Go's panic to fatal erroring in Swift. Panics are recoverable, Swift faults are not.

Yikes, you're totally right. I definitely assumed panic meant "game over, execution will stop", but definitely not the case. From the go blog:

Recover is a built-in function that regains control of a panicking goroutine.

I thought the Go impls semantics was only to bail out of the registered type did not match, not a "re-creation of correct type" 🤔

Me too, I was surprised a bit as well.

By "auto" you mean "return me the same instance if it existed already"? That's what the SwiftMetrics API would do; What am I missing?

SwiftMetrics is a-ok 👌 , I'd like SwiftPrometheus to do something similar with its behavior when used directly (see above section for my current preference).

Yasumoto commented 5 years ago

I played around with this a bit, and it "feels" better to follow the SwiftMetrics style and return the metric if it already exists. Open to other preferences though!

Yasumoto commented 5 years ago

@MrLotU @ktoso friendly nudge 😄

tomerd commented 5 years ago

I played around with this a bit, and it "feels" better to follow the SwiftMetrics style and return the metric if it already exists.

in one of the early iteration of swift-metrics, we did that for "free" as part of the API package (caching basically) but we decided to remove it and leave it to the implementation to decide. imo, the expected behavior when type and other arguments align is to return the existing one, but we wanted to leave the door open to other logic

Yasumoto commented 5 years ago

@tomerd ah, that's good context and totally makes sense. Given the behavior of the golang prometheus client library (throw an exception if you try to create a metric that already exists) I see the value in leaving it open.

For now I think keeping this simple is fine, but I'm definitely open to other feedback too.

Yasumoto commented 5 years ago

Looks like this needed a rebase off master 👍

Yasumoto commented 5 years ago

Thanks, folks! 🎊

Yasumoto commented 5 years ago

@MrLotU do you mind adding this to the 1.0 milestone? I would but not sure I have the permissions. 🙇‍♂️

MrLotU commented 5 years ago

Done. Good catch