prometheus / client_java

Prometheus instrumentation library for JVM applications
http://prometheus.github.io/client_java/
Apache License 2.0
2.18k stars 798 forks source link

Duplicate metrics when exposing prometheus metrics via spring boot #130

Closed ddewaele closed 7 years ago

ddewaele commented 8 years ago

I've got a number of duplicate prometheus metrics in my spring boot app (ex: normalized_servo_rest_totaltime:

# HELP gauge_servo_response_assetmgmt_star_star gauge_servo_response_assetmgmt_star_star
# TYPE gauge_servo_response_assetmgmt_star_star gauge
gauge_servo_response_assetmgmt_star_star 16.0
# HELP gauge_servo_response_star_star gauge_servo_response_star_star
# TYPE gauge_servo_response_star_star gauge
gauge_servo_response_star_star 13.0
# HELP normalized_servo_rest_totaltime normalized_servo_rest_totaltime
# TYPE normalized_servo_rest_totaltime gauge
normalized_servo_rest_totaltime 0.0
# HELP normalized_servo_rest_count normalized_servo_rest_count
# TYPE normalized_servo_rest_count gauge
normalized_servo_rest_count 0.0
# HELP gauge_servo_rest_min gauge_servo_rest_min
# TYPE gauge_servo_rest_min gauge
gauge_servo_rest_min 0.0
# HELP gauge_servo_rest_max gauge_servo_rest_max
# TYPE gauge_servo_rest_max gauge
gauge_servo_rest_max 0.0
# HELP gauge_servo_response_error gauge_servo_response_error
# TYPE gauge_servo_response_error gauge
gauge_servo_response_error 10098.0
# HELP gauge_servo_response_star_star_favicon_ico gauge_servo_response_star_star_favicon_ico
# TYPE gauge_servo_response_star_star_favicon_ico gauge
gauge_servo_response_star_star_favicon_ico 6.0
# HELP gauge_servo_response_prometheus gauge_servo_response_prometheus
# TYPE gauge_servo_response_prometheus gauge
gauge_servo_response_prometheus 0.0
# HELP gauge_servo_response_uaa_star_star gauge_servo_response_uaa_star_star
# TYPE gauge_servo_response_uaa_star_star gauge
gauge_servo_response_uaa_star_star 7.0
# HELP normalized_servo_rest_totaltime normalized_servo_rest_totaltime
# TYPE normalized_servo_rest_totaltime gauge
normalized_servo_rest_totaltime 0.0
# HELP normalized_servo_rest_count normalized_servo_rest_count
# TYPE normalized_servo_rest_count gauge
normalized_servo_rest_count 0.0
# HELP gauge_servo_rest_min gauge_servo_rest_min
# TYPE gauge_servo_rest_min gauge
gauge_servo_rest_min 0.0
# HELP gauge_servo_rest_max gauge_servo_rest_max
# TYPE gauge_servo_rest_max gauge
gauge_servo_rest_max 0.0

In my spring boot metrics I don't see any duplicates

"gauge.servo.response.assetmgmt.star-star": 16,
"gauge.servo.response.star-star": 13,
"normalized.servo.restclient.totaltime": 0,
"normalized.servo.restclient.count": 0,
"gauge.servo.restclient.min": 0,
"gauge.servo.restclient.max": 0,
"normalized.servo.rest.totaltime": 0,
"normalized.servo.rest.count": 0,
"gauge.servo.rest.min": 0,
"gauge.servo.rest.max": 0,
"gauge.servo.response.error": 10098,
"gauge.servo.response.star-star.favicon.ico": 6,
"gauge.servo.response.prometheus": 2,
brian-brazil commented 8 years ago

The code looks fine, can you verify that there's actually no duplicates being passed to the Prometheus client code?

ddewaele commented 8 years ago

Will check ... noticed that it doesn't happen immediately... when the spring boot app starts there no duplicates. after a while they pop up .... I'll see if I can debug it a bit more and provide some more info

ddewaele commented 8 years ago

Here are the duplicates being generated :

This is a Spring Boot app with some Spring cloud dependencies (in this case Netflix Zuul)

The Spring Boot metrics endpoint captures all metrics in a map, thus avoiding duplicates (but potentially also overwriting other values).

https://github.com/spring-projects/spring-boot/blob/master/spring-boot-actuator/src/main/java/org/springframework/boot/actuate/endpoint/MetricsEndpoint.java#L77

brian-brazil commented 8 years ago

Nothing we can do on our end then, this is a bug in spring-boot.

ddewaele commented 8 years ago

Will log an issue at spring boot / cloud side ...

However I assume that the prometheus spring boot client should follow the spring boot semantics ? (If spring boot decides it's not a bug, and that it's the metrics providers responsibility to ensure uniqueness, then the prometheus spring boot client should follow and also skip duplicates ?)

brian-brazil commented 8 years ago

It depends on what they say. From the screenshot shown these are not duplicates, they're insufficiently distinguished.

ddewaele commented 8 years ago

Logged an issue https://github.com/spring-projects/spring-boot/issues/6404

maust commented 8 years ago

I ran into the same issue (probably caused by Spring Cloud/Eureka). Here's a (dirty) work around: https://gist.github.com/maust/3166c8b4471f34ee57f1995c79a56c20

maust commented 8 years ago

@ddewaele Seems like Spring Boot is not going to solve it, any progress (with servo)? @brian-brazil Would you accept a PR with the (Spring-way / work around) of overriding duplicate names?

brian-brazil commented 8 years ago

If the data model is such that there's no correct mapping to time series, we may have to drop support from the java client so as to avoid providing broken metrics.

vtatai commented 8 years ago

I'm also facing the duplicates metric issue, but I do not use Spring Boot. My use case is that metrics can be added in several points in the app to the default registry, and in some cases the same metric name can be reused. Since the CollectorRegistry class does not expose the current collectors, there is no way of knowing if the collector is already there or not.

I solved this problem by implementing equals and hashCode inside each of the collector classes, like Counter, Gauge, etc. So when the same collector is added to the registry, since it is a set it does not get added twice. Is that a good solution? If so, I can submit a patch for it. Perhaps that also solves the issue above as well.

brian-brazil commented 8 years ago

That'll be solved properly inside the simpleclient at some point, though if you're running into this likely means you're using instrumentation incorrectly. Metric instances should all be static fields of classes.

vtatai commented 8 years ago

I am bridging hundreds of metrics coming from another framework (Finagle), so I cannot create static fields.

brian-brazil commented 8 years ago

If you're using a custom collector then it's up to you to avoid clashes. Nothing we can do about it in the client library.

vtatai commented 8 years ago

I'm not using a custom collector - I guess this is a question more about the CollectorRegistry. For instance, if at one point I do:

Gauge.build().name(dynamicName).register(CollectorRegistry.defaultRegistry)

and then later (somewhere else) I do

Gauge.build().name(dynamicName).register(CollectorRegistry.defaultRegistry)

with the same value for dynamicName, then it will fail once Prometheus scrapes the data. From what I can tell there is no way of knowing which collectors the registry is storing. I could create a custom collector registry, but feel like this could be easily solved in the CollectorRegistry class itself, or perhaps in the SimpleCollector (by implementing equals / hashCode).

brian-brazil commented 8 years ago

You should never be using dynamic names with direct instrumentation, nor should you ever be providing duplicate names or attempting to dedup on the fly. You're approaching this from the wrong standpoint, make sure the names are unique and static to start with.

vtatai commented 8 years ago

Ok I see, thanks!

pidster commented 7 years ago

@ddewaele please can you share a simple example that reproduces this? If there's duplicate metrics being generated, maybe there's a model mismatch or multiple copies of a Spring Bean.

From a quick scan of the code, I can see that the SpringBootMetricsCollector.java class is annotated with @Component. If this was discovered during a classpath scan, and combined with the @EnableSpringBootMetricsCollector (which creates another bean of the same type), I think it could produce a duplicate bean.

kiview commented 7 years ago

If @maust's implementation is a valid(?) workaround, wouldn't it be okay to integrate this code into @EnableSpringBootMetricsCollector? I'm currently using this workaround as well and it seems to work.

brian-brazil commented 7 years ago

This is an upstream bug. To workaround it in any way would be to introduce a bug in our own code.

We need to wait on upstream to tell us what the intended semantics are.

kiview commented 7 years ago

I'd argue that this is a bug inside of SpringBootMetricsCollector. If Spring-Boot allows multiple Metrics with the same name inside a PublicMetric, the SpringBootMetricsCollector should be able to deal with this.

I don't think we will see any additional work from Spring-Boot on this issue, they clearly state that it's up to the metrics sources to determine how they want to write their metrics: https://github.com/spring-projects/spring-boot/issues/6404#issuecomment-233322735

brian-brazil commented 7 years ago

We're still waiting on upstream to clarify if the labels are merely annotations or actually part of the identity of the metric.

Blindly throwing away data on collision is not acceptable. We need to know what the intended semantics are here.

kiview commented 7 years ago

Okay I see your point there.

So do you think we should trigger the Spring-Boot team again, so they can clearly state if this is allowed behavior? Because we won't be able to communicate with all third party metrics providers about their intended metrics usages.

brian-brazil commented 7 years ago

I had asked a clarifying question (I had thought on https://github.com/spring-projects/spring-boot/issues/6404 but can't seem to find it now) but never got an answer back.

jzampieron commented 7 years ago

Any progress here. This is effectively a show-stopper for anyone using Spring Boot with Prometheus and the Spring Cloud Netflix libraries.

Spring has closed https://github.com/spring-projects/spring-boot/issues/6404

It seems as if the SpringBootMetricsCollector in the plug-in should more uniquely namespace the values.

brian-brazil commented 7 years ago

We're still waiting on Spring Boot to clarify their intended semantics.

jzampieron commented 7 years ago

Is there an active ticket with Spring Boot for them to define the expected semantics of the data?

As noted, 6404 has been closed. Do I need to reopen a ticket and drive it to closure?

kiview commented 7 years ago

@jzampieron I think this sounds like a good idea, I assume there won't be any clear answer otherwise.

jzampieron commented 7 years ago

So here's another take on it... there's still a bug here in Prometheus's spring-boot support.

No matter what the input provided by spring boot actuator, prometheus's end point should produce valid consumable output, or otherwise mark and log an internal server error. (500 code) with detailed information.

Producing invalid output to be consumed by your consumer is a bug by definition.

I'll work with to get a spring boot thread started to define/document the proper semantics, if someone here would take an action to at least error log, handle and report in a reasonable manner, that would be good.

Please and thanks.

brian-brazil commented 7 years ago

No matter what the input provided by spring boot actuator, prometheus's end point should produce valid consumable output, or otherwise mark and log an internal server error. (500 code) with detailed information.

I don't intend to add that level of sanity checking to the java client. I take the approach that we'll try and protect you if you're doing direct instrumentation, but if you're writing custom collectors it's presumed you know what you're up to.

Producing invalid output to be consumed by your consumer is a bug by definition.

GIGO

I'll work with to get a spring boot thread started to define/document the proper semantics

Looks like our current behaviour is correct, it just needs to be clarified in upstream docs and the providers fixed.

jzampieron commented 7 years ago

Alright, so then it sounds like the correct thing to do is open a ticket against spring-cloud-netflix to fix the provider.

cch0 commented 7 years ago

I ran into similar (if not the same) issue and I had to exclude the dependency on spring-cloud-starter-ribbon (which brings in servo) when adding spring-cloud-starter-consul-discovery in the spring-boot app.

maust commented 7 years ago

Instead of my previous work-around disabling servo metrics might be better:

spring.metrics.servo.enabled: false

brian-brazil commented 7 years ago

We know now these are bugs in the various users of spring metrics, so file bugs with each of them as needed.

This is not a problem with client java, we're just the one of the consumers of spring metrics that cares.