Closed buxingzhe closed 6 years ago
Hey @buxingzhe – sorry that you're having this problem. This exception occurs when the number of labels and number of values for those labels don't match up. It's likely a bug in the library, but I haven't come across this before. I'll try and work out what's happening ASAP, sorry for the inconvenience.
I think I actually saw this one and forgot to report it too. I believe it was happening when calling count with a string that extractLabels won't find a label in. In my case, I think I was calling it with something like "com.foo.bar", where the periods don't mean anything to it, and I think the extractLabels function is really expecting something in the format "event_name".
@cnstoll Interesting – that's useful to know. @buxingzhe: what's the name of the counter that you're using? I'll try and write a test case for this and then fix it, thanks for your patience!
Dug back through git history. The name of the counter I originally had that failed was log_store.create
, and replacing the .
with a _
was the first step to fixing it. But I think the prometheus stats receiver's version of the counter is also expecting two "labels", and so I think I ultimately landed on:
statsReceiver.counter("log_store_create", "create").incr()
I think the second "create" was required for it to work.
I could see a lot of ways to handle this in the library. Clearer error handling might be the simplest. But I was pretty new to Prometheus when we started using this, and I thought it would be interesting to provide some sort of convenience "label maker" class that would create a counter string in the right format for prometheus to parse out the event name and pairs of labels from the string. That seems like what the prometheus stats receiver and extractLabels is trying to do - but you kind of have to read through the implementation to see how it works.
@cnstoll Thanks for taking the time to describe that, it's really useful. Are you using the PrometheusStatsReceiver
directly? That's only supposed to be there to act as the bridge for existing Finagle metrics. For creating your own, there's the Telemetry
class. I need to think a bit about how to better convey the difference between these two.
For example, you'd do something like this:
val counter = telemetry.counter(
"log_store",
"Help text",
Seq("type"))
counter.labels("create").inc()
Does that perhaps make more sense?
Interesting. Yeah, I'm using the PrometheusStatsReceiver directly in my service. So basically, anywhere I was using a finagle StatsReceiver's counter I just started using a PrometheusStatsReceiver. But the strings representing metrics in my service weren't in a good format for Prometheus - hence the exception.
I don't think I really looked at telemetry at all - but that definitely looks a lot closer to what I actually wanted in practice!
@samstarling ,thanks for your quick response. As I don't add any my own metrics, it should belong to finagle. I'll add logs to show the metrics name for details. Have a good time!
@cnstoll I'd suggest looking at Telemetry
– I'm considering how I might make it clearer to people not to use the PrometheusStatsReceiver
manually, because it's really just there to act as the bridge. I'm trying to work out if there's a case where anyone would reasonably want to subclass it, or whether I might just be able to make it private
so it can't be imported and used. I'd need to check that works how I intend it, though.
@buxingzhe Do you have any idea which metric is causing the problem here? I haven't been able to replicate this locally yet.
Sorry, after I redeploying my application with logging, the issue haven't seen again. I'll keep observing it and update it once it happens again.
@buxingzhe Please do let me know if it happens again, and I'll investigate further. For now, I'll close this issue.
Today, I see the following log in my application, any idea of this exception? thanks. The finagle version is v18.2.0