vert-x3 / vertx-micrometer-metrics

Vert.x metrics implementation for Micrometer.io
Apache License 2.0
51 stars 38 forks source link

Performance degradation when metrics is enabled #91

Closed anatoliyck closed 2 years ago

anatoliyck commented 4 years ago

Vert.x version = 3.8.4

In my case a performance degradation between 25-35%. Under load event loop goes into BLOCKED status.

Vertx options:

      VertxOptions options =  new VertxOptions()
                .setMetricsOptions(
                        new MicrometerMetricsOptions()
                                .addLabels(Label.EB_ADDRESS)
                                .setMicrometerRegistry(meterRegistry)
                                .setEnabled(true)
                )
                .setPreferNativeTransport(true);

with disabled metrics everything is ok...

Case 1

I deploy STOMP server(N instances) as worker verticle.

Result:

"vert.x-eventloop-thread-17" #1424 prio=5 os_prio=0 cpu=1786.52ms elapsed=79.32s tid=0x00007f6a54033000 nid=0x53b8 waiting for monitor entry  [0x00007f69dd2f7000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at io.vertx.micrometer.impl.meters.Gauges.get(Gauges.java:59)
        - waiting to lock <0x000000068280c310> (a io.vertx.micrometer.impl.meters.Gauges)
        at io.vertx.micrometer.impl.VertxPoolMetrics$Instance.submitted(VertxPoolMetrics.java:68)
        at io.vertx.micrometer.impl.VertxPoolMetrics$Instance.submitted(VertxPoolMetrics.java:55)
        at io.vertx.core.impl.WorkerContext.wrapTask(WorkerContext.java:29)
        at io.vertx.core.impl.WorkerContext.execute(WorkerContext.java:61)
        at io.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:232)
        at io.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:224)
        at io.vertx.core.impl.VertxImpl$InternalTimerHandler.run(VertxImpl.java:913)
        at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute$$$capture(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:331)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)

Case 2

I deploy STOMP server(N instances) and call:

    vertx.executeBlocking(p -> { some code}, null)

Result:

"vert.x-eventloop-thread-10" #1076 prio=5 os_prio=0 cpu=5784.67ms elapsed=130.85s tid=0x00007f546401f800 nid=0x7c18 waiting for monitor entry  [0x00007f5290896000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at io.vertx.micrometer.impl.meters.Gauges.get(Gauges.java:59)
        - locked <0x00000006822b8c40> (a io.vertx.micrometer.impl.meters.Gauges)
        at io.vertx.micrometer.impl.VertxPoolMetrics$Instance.submitted(VertxPoolMetrics.java:68)
        at io.vertx.micrometer.impl.VertxPoolMetrics$Instance.submitted(VertxPoolMetrics.java:55)
        at io.vertx.core.impl.ContextImpl.executeBlocking(ContextImpl.java:302)
        at io.vertx.core.impl.ContextImpl.executeBlocking(ContextImpl.java:286)
        at io.vertx.core.impl.EventLoopContext.executeBlocking(EventLoopContext.java:23)
        at io.vertx.core.impl.VertxImpl.executeBlocking(VertxImpl.java:748)
        at MY CLASS
        at MY CLASS
        at io.vertx.ext.stomp.impl.DefaultStompHandler.handleSend(DefaultStompHandler.java:353)
        at io.vertx.ext.stomp.impl.DefaultStompHandler.handle(DefaultStompHandler.java:241)
        at MY CLASS
        at MY CLASS
        at io.vertx.ext.stomp.impl.StompServerImpl.lambda$null$13(StompServerImpl.java:237)
        at io.vertx.ext.stomp.impl.StompServerImpl$$Lambda$1664/0x0000000840ec6840.handle(Unknown Source)
        at io.vertx.ext.stomp.impl.FrameParser.handleLine(FrameParser.java:164)
        at io.vertx.ext.stomp.impl.FrameParser$$Lambda$1560/0x0000000840e6a840.handle(Unknown Source)
        at io.vertx.core.parsetools.impl.RecordParserImpl.handleParsing(RecordParserImpl.java:214)
        at io.vertx.core.parsetools.impl.RecordParserImpl.handle(RecordParserImpl.java:285)
        at io.vertx.ext.stomp.impl.FrameParser.handle(FrameParser.java:221)
        - locked <0x0000000687910ef0> (a io.vertx.ext.stomp.impl.FrameParser)
        at io.vertx.ext.stomp.impl.FrameParser.handle(FrameParser.java:35)
        at io.vertx.core.streams.impl.InboundBuffer.handleEvent(InboundBuffer.java:237)
        at io.vertx.core.streams.impl.InboundBuffer.write(InboundBuffer.java:127)
        at io.vertx.core.http.impl.WebSocketImplBase.handleFrame(WebSocketImplBase.java:357)
        - locked <0x0000000685b85e88> (a io.vertx.core.http.impl.Http1xServerConnection)
        at io.vertx.core.http.impl.Http1xConnectionBase.handleWsFrame(Http1xConnectionBase.java:124)
        at io.vertx.core.http.impl.Http1xServerConnection.handleMessage(Http1xServerConnection.java:142)
        at io.vertx.core.net.impl.VertxHandler$$Lambda$1540/0x0000000840e67440.handle(Unknown Source)
        at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
        at io.vertx.core.impl.EventLoopContext.execute(EventLoopContext.java:43)
        at io.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:232)
        at io.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:173)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)
        at io.netty.handler.codec.http.websocketx.extensions.WebSocketServerExtensionHandler.channelRead(WebSocketServerExtensionHandler.java:102)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930)
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:424)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
jhalliday commented 2 years ago

Micrometer's Tag manipulation is sadly expensive and there doesn't seem to be much that can be done about that without changes to the micrometer library rather than vertx. However, I have a patch that moves the locking off the Gauge lookup (as distinct from create) path, which reduces metrics usage impact on throughput by around half in my microbenchmarks. You still pay the CPU cost, but at least now it's not blocking other threads.

With reference to the potential fix for #139, I can copy over the existing broken behaviour for this case, i.e. return an incorrect T instance. Currently my patch returns null instead in such circumstances, which breaks things much more visibly. Maintainers: Any preference on behaviour here, assuming we're not going to prevent users sharing registries between vertx instances? Maybe log WARN but keep running with bad metrics? Non-issue for the current implementation, which can't detect the problem so can't warn about it. Once that's decided and #139 is resolved I can PR something for this one.

tsegismont commented 2 years ago

cc @jotak

pierDipi commented 2 years ago

Hi, we're seeing a similar behavior when Prometheus is scraping metrics [1] or when some Vertx Metrics are enabled (we ended up disabling them)

@jhalliday, would you mind sharing your patch (even if it's not entirely completed)? I can take a look.

[1]

May 06, 2022 10:16:01 AM io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 15726 ms, time limit is 2000 ms
io.vertx.core.VertxException: Thread blocked
    at app//io.prometheus.client.exporter.common.TextFormat.writeEscapedLabelValue(TextFormat.java:156)
    at app//io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:119)
    at app//io.prometheus.client.exporter.common.TextFormat.writeFormat(TextFormat.java:53)
    at app//io.micrometer.prometheus.PrometheusMeterRegistry.scrape(PrometheusMeterRegistry.java:134)
    at app//io.micrometer.prometheus.PrometheusMeterRegistry.scrape(PrometheusMeterRegistry.java:130)
    at app//io.micrometer.prometheus.PrometheusMeterRegistry.scrape(PrometheusMeterRegistry.java:101)
    at app//io.micrometer.prometheus.PrometheusMeterRegistry.scrape(PrometheusMeterRegistry.java:87)
    at app//io.vertx.micrometer.backends.PrometheusBackendRegistry.handleRequest(PrometheusBackendRegistry.java:90)
    at app//io.vertx.micrometer.backends.PrometheusBackendRegistry$$Lambda$163/0x0000000800281c60.handle(Unknown Source)
    at app//io.vertx.core.http.impl.Http1xServerRequestHandler.handle(Http1xServerRequestHandler.java:67)
    at app//io.vertx.core.http.impl.Http1xServerRequestHandler.handle(Http1xServerRequestHandler.java:30)
    at app//io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:50)
    at app//io.vertx.core.impl.DuplicatedContext.emit(DuplicatedContext.java:168)
    at app//io.vertx.core.http.impl.Http1xServerConnection.handleMessage(Http1xServerConnection.java:145)
    at app//io.vertx.core.net.impl.ConnectionBase.read(ConnectionBase.java:156)
    at app//io.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:153)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at app//io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)
    at app//io.netty.handler.codec.http.websocketx.extensions.WebSocketServerExtensionHandler.channelRead(WebSocketServerExtensionHandler.java:99)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at app//io.vertx.core.http.impl.Http1xUpgradeToH2CHandler.channelRead(Http1xUpgradeToH2CHandler.java:116)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at app//io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327)
    at app//io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at app//io.vertx.core.http.impl.Http1xOrH2CHandler.end(Http1xOrH2CHandler.java:61)
    at app//io.vertx.core.http.impl.Http1xOrH2CHandler.channelRead(Http1xOrH2CHandler.java:38)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at app//io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at app//io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    at app//io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at app//io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
    at app//io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
    at app//io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
    at app//io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    at app//io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at app//io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base@17.0.3/java.lang.Thread.run(Unknown Source)
jhalliday commented 2 years ago

I could, but I don't think it will help you - that doesn't look like lock contention. I'd profile it first and see where that 15s is actually going.

tsegismont commented 2 years ago

There is a proposed fix in #143

I will probably backport it to 4.2