Closed anatoliyck closed 2 years ago
Micrometer's Tag manipulation is sadly expensive and there doesn't seem to be much that can be done about that without changes to the micrometer library rather than vertx. However, I have a patch that moves the locking off the Gauge lookup (as distinct from create) path, which reduces metrics usage impact on throughput by around half in my microbenchmarks. You still pay the CPU cost, but at least now it's not blocking other threads.
With reference to the potential fix for #139, I can copy over the existing broken behaviour for this case, i.e. return an incorrect T instance. Currently my patch returns null instead in such circumstances, which breaks things much more visibly. Maintainers: Any preference on behaviour here, assuming we're not going to prevent users sharing registries between vertx instances? Maybe log WARN but keep running with bad metrics? Non-issue for the current implementation, which can't detect the problem so can't warn about it. Once that's decided and #139 is resolved I can PR something for this one.
cc @jotak
Hi, we're seeing a similar behavior when Prometheus is scraping metrics [1] or when some Vertx Metrics are enabled (we ended up disabling them)
@jhalliday, would you mind sharing your patch (even if it's not entirely completed)? I can take a look.
[1]
May 06, 2022 10:16:01 AM io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 15726 ms, time limit is 2000 ms
io.vertx.core.VertxException: Thread blocked
at app//io.prometheus.client.exporter.common.TextFormat.writeEscapedLabelValue(TextFormat.java:156)
at app//io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:119)
at app//io.prometheus.client.exporter.common.TextFormat.writeFormat(TextFormat.java:53)
at app//io.micrometer.prometheus.PrometheusMeterRegistry.scrape(PrometheusMeterRegistry.java:134)
at app//io.micrometer.prometheus.PrometheusMeterRegistry.scrape(PrometheusMeterRegistry.java:130)
at app//io.micrometer.prometheus.PrometheusMeterRegistry.scrape(PrometheusMeterRegistry.java:101)
at app//io.micrometer.prometheus.PrometheusMeterRegistry.scrape(PrometheusMeterRegistry.java:87)
at app//io.vertx.micrometer.backends.PrometheusBackendRegistry.handleRequest(PrometheusBackendRegistry.java:90)
at app//io.vertx.micrometer.backends.PrometheusBackendRegistry$$Lambda$163/0x0000000800281c60.handle(Unknown Source)
at app//io.vertx.core.http.impl.Http1xServerRequestHandler.handle(Http1xServerRequestHandler.java:67)
at app//io.vertx.core.http.impl.Http1xServerRequestHandler.handle(Http1xServerRequestHandler.java:30)
at app//io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:50)
at app//io.vertx.core.impl.DuplicatedContext.emit(DuplicatedContext.java:168)
at app//io.vertx.core.http.impl.Http1xServerConnection.handleMessage(Http1xServerConnection.java:145)
at app//io.vertx.core.net.impl.ConnectionBase.read(ConnectionBase.java:156)
at app//io.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:153)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at app//io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)
at app//io.netty.handler.codec.http.websocketx.extensions.WebSocketServerExtensionHandler.channelRead(WebSocketServerExtensionHandler.java:99)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at app//io.vertx.core.http.impl.Http1xUpgradeToH2CHandler.channelRead(Http1xUpgradeToH2CHandler.java:116)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at app//io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327)
at app//io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at app//io.vertx.core.http.impl.Http1xOrH2CHandler.end(Http1xOrH2CHandler.java:61)
at app//io.vertx.core.http.impl.Http1xOrH2CHandler.channelRead(Http1xOrH2CHandler.java:38)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at app//io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at app//io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at app//io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at app//io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at app//io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at app//io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at app//io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at app//io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at app//io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base@17.0.3/java.lang.Thread.run(Unknown Source)
I could, but I don't think it will help you - that doesn't look like lock contention. I'd profile it first and see where that 15s is actually going.
There is a proposed fix in #143
I will probably backport it to 4.2
Vert.x version = 3.8.4
In my case a performance degradation between 25-35%. Under load event loop goes into
BLOCKED
status.Vertx options:
with disabled metrics everything is ok...
Case 1
I deploy STOMP server(N instances) as worker verticle.
Result:
Case 2
I deploy STOMP server(N instances) and call:
Result: