Closed ddewaele closed 7 years ago
The code looks fine, can you verify that there's actually no duplicates being passed to the Prometheus client code?
Will check ... noticed that it doesn't happen immediately... when the spring boot app starts there no duplicates. after a while they pop up .... I'll see if I can debug it a bit more and provide some more info
Here are the duplicates being generated :
This is a Spring Boot app with some Spring cloud dependencies (in this case Netflix Zuul)
The Spring Boot metrics endpoint captures all metrics in a map, thus avoiding duplicates (but potentially also overwriting other values).
Nothing we can do on our end then, this is a bug in spring-boot.
Will log an issue at spring boot / cloud side ...
However I assume that the prometheus spring boot client should follow the spring boot semantics ? (If spring boot decides it's not a bug, and that it's the metrics providers responsibility to ensure uniqueness, then the prometheus spring boot client should follow and also skip duplicates ?)
It depends on what they say. From the screenshot shown these are not duplicates, they're insufficiently distinguished.
Logged an issue https://github.com/spring-projects/spring-boot/issues/6404
I ran into the same issue (probably caused by Spring Cloud/Eureka). Here's a (dirty) work around: https://gist.github.com/maust/3166c8b4471f34ee57f1995c79a56c20
@ddewaele Seems like Spring Boot is not going to solve it, any progress (with servo)? @brian-brazil Would you accept a PR with the (Spring-way / work around) of overriding duplicate names?
If the data model is such that there's no correct mapping to time series, we may have to drop support from the java client so as to avoid providing broken metrics.
I'm also facing the duplicates metric issue, but I do not use Spring Boot. My use case is that metrics can be added in several points in the app to the default registry, and in some cases the same metric name can be reused. Since the CollectorRegistry
class does not expose the current collectors, there is no way of knowing if the collector is already there or not.
I solved this problem by implementing equals
and hashCode
inside each of the collector classes, like Counter
, Gauge
, etc. So when the same collector is added to the registry, since it is a set it does not get added twice. Is that a good solution? If so, I can submit a patch for it. Perhaps that also solves the issue above as well.
That'll be solved properly inside the simpleclient at some point, though if you're running into this likely means you're using instrumentation incorrectly. Metric instances should all be static fields of classes.
I am bridging hundreds of metrics coming from another framework (Finagle), so I cannot create static fields.
If you're using a custom collector then it's up to you to avoid clashes. Nothing we can do about it in the client library.
I'm not using a custom collector - I guess this is a question more about the CollectorRegistry
. For instance, if at one point I do:
Gauge.build().name(dynamicName).register(CollectorRegistry.defaultRegistry)
and then later (somewhere else) I do
Gauge.build().name(dynamicName).register(CollectorRegistry.defaultRegistry)
with the same value for dynamicName
, then it will fail once Prometheus scrapes the data. From what I can tell there is no way of knowing which collectors the registry is storing. I could create a custom collector registry, but feel like this could be easily solved in the CollectorRegistry class itself, or perhaps in the SimpleCollector (by implementing equals / hashCode).
You should never be using dynamic names with direct instrumentation, nor should you ever be providing duplicate names or attempting to dedup on the fly. You're approaching this from the wrong standpoint, make sure the names are unique and static to start with.
Ok I see, thanks!
@ddewaele please can you share a simple example that reproduces this? If there's duplicate metrics being generated, maybe there's a model mismatch or multiple copies of a Spring Bean.
From a quick scan of the code, I can see that the SpringBootMetricsCollector.java
class is annotated with @Component
. If this was discovered during a classpath scan, and combined with the @EnableSpringBootMetricsCollector
(which creates another bean of the same type), I think it could produce a duplicate bean.
If @maust's implementation is a valid(?) workaround, wouldn't it be okay to integrate this code into @EnableSpringBootMetricsCollector
? I'm currently using this workaround as well and it seems to work.
This is an upstream bug. To workaround it in any way would be to introduce a bug in our own code.
We need to wait on upstream to tell us what the intended semantics are.
I'd argue that this is a bug inside of SpringBootMetricsCollector. If Spring-Boot allows multiple Metrics
with the same name inside a PublicMetric
, the SpringBootMetricsCollector
should be able to deal with this.
I don't think we will see any additional work from Spring-Boot on this issue, they clearly state that it's up to the metrics sources to determine how they want to write their metrics: https://github.com/spring-projects/spring-boot/issues/6404#issuecomment-233322735
We're still waiting on upstream to clarify if the labels are merely annotations or actually part of the identity of the metric.
Blindly throwing away data on collision is not acceptable. We need to know what the intended semantics are here.
Okay I see your point there.
So do you think we should trigger the Spring-Boot team again, so they can clearly state if this is allowed behavior? Because we won't be able to communicate with all third party metrics providers about their intended metrics usages.
I had asked a clarifying question (I had thought on https://github.com/spring-projects/spring-boot/issues/6404 but can't seem to find it now) but never got an answer back.
Any progress here. This is effectively a show-stopper for anyone using Spring Boot with Prometheus and the Spring Cloud Netflix libraries.
Spring has closed https://github.com/spring-projects/spring-boot/issues/6404
It seems as if the SpringBootMetricsCollector in the plug-in should more uniquely namespace the values.
We're still waiting on Spring Boot to clarify their intended semantics.
Is there an active ticket with Spring Boot for them to define the expected semantics of the data?
As noted, 6404 has been closed. Do I need to reopen a ticket and drive it to closure?
@jzampieron I think this sounds like a good idea, I assume there won't be any clear answer otherwise.
So here's another take on it... there's still a bug here in Prometheus's spring-boot support.
No matter what the input provided by spring boot actuator, prometheus's end point should produce valid consumable output, or otherwise mark and log an internal server error. (500 code) with detailed information.
Producing invalid output to be consumed by your consumer is a bug by definition.
I'll work with to get a spring boot thread started to define/document the proper semantics, if someone here would take an action to at least error log, handle and report in a reasonable manner, that would be good.
Please and thanks.
No matter what the input provided by spring boot actuator, prometheus's end point should produce valid consumable output, or otherwise mark and log an internal server error. (500 code) with detailed information.
I don't intend to add that level of sanity checking to the java client. I take the approach that we'll try and protect you if you're doing direct instrumentation, but if you're writing custom collectors it's presumed you know what you're up to.
Producing invalid output to be consumed by your consumer is a bug by definition.
GIGO
I'll work with to get a spring boot thread started to define/document the proper semantics
Looks like our current behaviour is correct, it just needs to be clarified in upstream docs and the providers fixed.
Alright, so then it sounds like the correct thing to do is open a ticket against spring-cloud-netflix to fix the provider.
I ran into similar (if not the same) issue and I had to exclude the dependency on spring-cloud-starter-ribbon (which brings in servo) when adding spring-cloud-starter-consul-discovery in the spring-boot app.
Instead of my previous work-around disabling servo metrics might be better:
spring.metrics.servo.enabled: false
We know now these are bugs in the various users of spring metrics, so file bugs with each of them as needed.
This is not a problem with client java, we're just the one of the consumers of spring metrics that cares.
I've got a number of duplicate prometheus metrics in my spring boot app (ex:
normalized_servo_rest_totaltime
:In my spring boot metrics I don't see any duplicates