Open DanielThomas opened 7 months ago
Experimenting with this locally, this isn't as straight forward as it first appears, because synthetic meters are recursively registered:
java.lang.IllegalStateException: Recursive update
at io.micrometer.core.instrument.MeterRegistry.getOrCreateMeter(MeterRegistry.java:609)
at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:585)
at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:578)
at io.micrometer.core.instrument.MeterRegistry.gauge(MeterRegistry.java:319)
at io.micrometer.core.instrument.Gauge$Builder.register(Gauge.java:195)
at io.micrometer.core.instrument.distribution.HistogramGauges.<init>(HistogramGauges.java:141)
at io.micrometer.core.instrument.distribution.HistogramGauges.register(HistogramGauges.java:89)
at io.micrometer.core.instrument.distribution.HistogramGauges.getHistogramGauges(HistogramGauges.java:53)
at io.micrometer.core.instrument.distribution.HistogramGauges.registerWithCommonFormat(HistogramGauges.java:48)
at io.micrometer.core.instrument.step.StepMeterRegistry.newTimer(StepMeterRegistry.java:83)
at io.micrometer.core.instrument.MeterRegistry.lambda$timer$5(MeterRegistry.java:333)
at io.micrometer.core.instrument.MeterRegistry.lambda$getOrCreateMeter$10(MeterRegistry.java:624)
at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
at io.micrometer.core.instrument.MeterRegistry.getOrCreateMeter(MeterRegistry.java:609)
at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:585)
at io.micrometer.core.instrument.MeterRegistry.timer(MeterRegistry.java:331)
at io.micrometer.core.instrument.Timer$Builder.register(Timer.java:459)
at io.micrometer.core.instrument.Timer$Builder.register(Timer.java:453)
So these would have to be lifted out to avoid being recursive, and ideally, also avoid a separate locks for synthetics.
While investigating https://github.com/micrometer-metrics/micrometer/issues/4747 I looked closer at the excessive dimensionality of the metrics for this class of service. It turns out that https://github.com/Netflix/dgs-framework/ tags internally can lead to very high numbers of distinct meters, and due to the lack of expiry, essentially cause a memory leak:
gql.resolver: 39829
dgs.req.passport: 78941
gql.query: 6764525
Those dimensions are excessive even without this monitor to deal with, but lock-free atomic registration and removal of meters by key (and expiry!) would close the gap between plain Spectator and Micrometer significantly.
Thanks for taking a look at this. I think there are improvements that can be made here but it's going to take some care due to the very central nature of this code. We've tended toward being overly cautious about correctness over performance. The last relevant change in this code was probably the move off of PCollections to standard Java collections in https://github.com/micrometer-metrics/micrometer/commit/47912661796ff9ae30d9f3326c5921b65bddde5d. I'll have to spend some more time looking at the synthetic meters implementation, and maybe we can add some jcstress tests for the specific scenarios about which we want to guarantee correctness.
This very coarse lock on the
ConcurrentMap
meterMap
can lead to excessive blocking in high RPS environment that create meters at a high rate, due to high dimensionality in their tags:https://github.com/micrometer-metrics/micrometer/blob/d6fc0b8fdcb10bd90707855b67c8d27adf83bee4/micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java#L608-L639
From a 30 second profile of one of our applications, at around 200 rps:
Environment
Confirmed this is an issue at current mainline.
To Reproduce
Concurrently create meters with unique Ids. Note the monitor contention.
Expected behavior
The
MeterRegistry
should optimize for highly concurrent environments, and avoid coarse locking, particularly on the meter creation hot path.ConcurrentHashMap.computeIfAbsent
is atomic, and at a glance, the need forsyntheticAssociations
isn't clear to me, as it looks to me that the key on themeterMap
will have the association, so the locking shouldn't be necessary there either?