Blocked threads and other errors in logs

Aaron-ML commented 1 year ago

Things to note:

Using AKS on Kubernetes v1.26.6

Helm chart version 1.0.0

Deployed using the helm chart with the following values: (we are precreating the namespace due to annotations for specific nodes) Certmanager successfully issues certificates and the pods start without issue. They do appear to work but wanted to validate that these errors are expected.

namespace:
  create: false
  name: strimzi-drain-cleaner

replicaCount: 2

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
                - strimzi-drain-cleaner
          topologyKey: "kubernetes.io/hostname"
        weight: 10

resources:
  limits:
    cpu: 100m
    memory: 256Mi

Log messages we are seeing:

Startup:

2023-11-01 01:59:18,293 WARN  [io.qua.net.run.NettyRecorder] (Thread-0) Netty DefaultChannelId initialization (with io.netty.machineId system property set to a1:8d:9d:12:2b:1f:43:a5) took more than a second
2023-11-01 01:59:28,390 INFO  [io.str.CertificateWatch] (main) Starting the certificate watch
2023-11-01 01:59:36,992 INFO  [io.str.CertificateWatch] (main) Getting initial values from the secret
2023-11-01 01:59:37,693 INFO  [io.str.CertificateWatch] (-655059806-pool-5-thread-1) Secret strimzi-drain-cleaner was added and will be checked for changes
2023-11-01 01:59:37,693 INFO  [io.str.CertificateWatch] (-655059806-pool-5-thread-1) No change to watched fields ([tls.crt, tls.key]) detected
2023-11-01 01:59:42,293 INFO  [io.quarkus] (main) strimzi-drain-cleaner 1.0.0 on JVM (powered by Quarkus 3.2.6.Final) started in 39.998s. Listening on: http://0.0.0.0:8080 and https://0.0.0.0:8443
2023-11-01 01:59:42,390 INFO  [io.quarkus] (main) Profile prod activated.
2023-11-01 01:59:42,390 INFO  [io.quarkus] (main) Installed features: [cdi, kubernetes-client, resteasy, resteasy-jackson, smallrye-context-propagation, vertx]

Random connection resets

2023-11-01 02:04:04,067 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset
2023-11-01 02:35:23,735 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 02:45:08,108 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 02:58:07,055 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 03:36:17,430 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 03:48:06,235 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset
2023-11-01 05:29:53,431 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 05:35:24,684 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 05:59:33,213 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 06:20:04,150 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 06:50:36,264 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset
2023-11-01 07:04:32,078 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 07:10:44,574 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 07:23:34,864 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset
2023-11-01 07:32:19,588 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset
2023-11-01 07:46:05,551 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 08:11:51,520 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset
2023-11-01 08:46:52,678 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset

Thread block logs

2023-11-01 16:28:44,089 WARN  [io.ver.cor.imp.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 5902 ms, time limit is 2000 ms: io.vertx.core.VertxException: Thread blocked
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:230)
    at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142)
    at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:654)
    at com.fasterxml.jackson.databind.ObjectReader._prefetchRootDeserializer(ObjectReader.java:2430)
    at com.fasterxml.jackson.databind.ObjectReader.<init>(ObjectReader.java:194)
    at com.fasterxml.jackson.databind.ObjectMapper._newReader(ObjectMapper.java:780)
    at com.fasterxml.jackson.databind.ObjectMapper.readerFor(ObjectMapper.java:4287)
    at io.fabric8.kubernetes.client.utils.KubernetesSerialization.unmarshal(KubernetesSerialization.java:254)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.lambda$handleResponse$0(OperationSupport.java:562)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport$$Lambda$335/0x00007fe064352920.apply(Unknown Source)
    at java.base@17.0.8/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
    at java.base@17.0.8/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at java.base@17.0.8/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
    at io.fabric8.kubernetes.client.http.StandardHttpClient.lambda$completeOrCancel$10(StandardHttpClient.java:140)
    at io.fabric8.kubernetes.client.http.StandardHttpClient$$Lambda$334/0x00007fe0643526e8.accept(Unknown Source)
    at java.base@17.0.8/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
    at java.base@17.0.8/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
    at java.base@17.0.8/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at java.base@17.0.8/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
    at io.fabric8.kubernetes.client.http.ByteArrayBodyHandler.onBodyDone(ByteArrayBodyHandler.java:52)
    at io.fabric8.kubernetes.client.http.ByteArrayBodyHandler$$Lambda$389/0x00007fe064409868.accept(Unknown Source)
    at java.base@17.0.8/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
    at java.base@17.0.8/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
    at java.base@17.0.8/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at java.base@17.0.8/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
    at io.fabric8.kubernetes.client.vertx.VertxHttpRequest.lambda$null$1(VertxHttpRequest.java:122)
    at io.fabric8.kubernetes.client.vertx.VertxHttpRequest$$Lambda$387/0x00007fe064408af0.handle(Unknown Source)
    at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264)
    at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:246)
    at io.vertx.core.http.impl.HttpEventHandler.handleEnd(HttpEventHandler.java:76)
    at io.vertx.core.http.impl.HttpClientResponseImpl.handleEnd(HttpClientResponseImpl.java:250)
    at io.vertx.core.http.impl.HttpClientRequestBase$$Lambda$383/0x00007fe064407b00.handle(Unknown Source)
    at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.lambda$new$0(Http1xClientConnection.java:444)
    at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl$$Lambda$372/0x00007fe0643ba3c0.handle(Unknown Source)
    at io.vertx.core.streams.impl.InboundBuffer.handleEvent(InboundBuffer.java:255)
    at io.vertx.core.streams.impl.InboundBuffer.write(InboundBuffer.java:134)
    at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.handleEnd(Http1xClientConnection.java:708)
    at io.vertx.core.http.impl.Http1xClientConnection$$Lambda$393/0x00007fe06440ad38.handle(Unknown Source)
    at io.vertx.core.impl.EventLoopContext.execute(EventLoopContext.java:76)
    at io.vertx.core.impl.DuplicatedContext.execute(DuplicatedContext.java:153)
    at io.vertx.core.http.impl.Http1xClientConnection.handleResponseEnd(Http1xClientConnection.java:945)
    at io.vertx.core.http.impl.Http1xClientConnection.handleHttpMessage(Http1xClientConnection.java:814)
    at io.vertx.core.http.impl.Http1xClientConnection.handleMessage(Http1xClientConnection.java:778)
    at io.vertx.core.net.impl.ConnectionBase.read(ConnectionBase.java:158)
    at io.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:153)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
    at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1383)
    at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1246)
    at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1295)
    at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base@17.0.8/java.lang.Thread.run(Thread.java:833)

Watch termination failure logs

2023-11-02 16:52:31,877 ERROR [io.fab.kub.cli.dsl.int.AbstractWatchManager] (CachedSingleThreadScheduler-956429999-pool-6-thread-1) The last watch has yet to terminate as expected, will force start another watch. Please report this to the Fabric8 Kubernetes Client development team.

Just want to verify I'm not missing anything here, I don't see any errors in the kubernetes event log so I'm unsure what's happening here. Walking through the bindings everything seems to align. Happy to provide more info as needed!

scholzj commented 1 year ago

It is hard to say without a full log, but I think:

The connection reset can be ignored I guess? That is just as the connections are recreated probably.
The thread blocked suggests that some blocking operation is taking longer than expected. In this case it looks like decoding/encoding some JSON. I have never seen it. I wonder if 100m CPU has something to do with it (i.e. is too small). But we do not seem to have anything in your files as default. So TBH I'm not sure what would a better value be. It could also suggests it is being fed some misformatted input. But that seems unlikely given the input should come from Kubernetes.

Aaron-ML commented 1 year ago

It is hard to say without a full log, but I think:

The connection reset can be ignored I guess? That is just as the connections are recreated probably.

The thread blocked suggests that some blocking operation is taking longer than expected. In this case it looks like decoding/encoding some JSON. I have never seen it. I wonder if 100m CPU has something to do with it (i.e. is too small). But we do not seem to have anything in your files as default. So TBH I'm not sure what would a better value be. It could also suggests it is being fed some misformatted input. But that seems unlikely given the input should come from Kubernetes.

I can throw up the full log but the connection resets are pretty random on that interval. Could just be Azure API things.

I haven't seen any CPU throttling with that but it might be to small to catch on our monitoring. I'll increase it and see if it changes. Thanks for the insight

Aaron-ML commented 1 year ago

I removed the resource limits entirely and I'm still seeing the logs aside from the thread blocking one. Will continue to monitor.

Drain cleaner log

``` 2023-11-06 20:51:30,859 INFO [io.str.CertificateWatch] (main) Starting the certificate watch 2023-11-06 20:51:31,406 INFO [io.str.CertificateWatch] (main) Getting initial values from the secret 2023-11-06 20:51:31,440 INFO [io.str.CertificateWatch] (-1433568941-pool-5-thread-1) Secret strimzi-drain-cleaner was added and will be checked for changes 2023-11-06 20:51:31,441 INFO [io.str.CertificateWatch] (-1433568941-pool-5-thread-1) No change to watched fields ([tls.crt, tls.key]) detected 2023-11-06 20:51:31,789 INFO [io.quarkus] (main) strimzi-drain-cleaner 1.0.0 on JVM (powered by Quarkus 3.2.6.Final) started in 2.628s. Listening on: http://0.0.0.0:8080 and https://0.0.0.0:8443 2023-11-06 20:51:31,790 INFO [io.quarkus] (main) Profile prod activated. 2023-11-06 20:51:31,790 INFO [io.quarkus] (main) Installed features: [cdi, kubernetes-client, resteasy, resteasy-jackson, smallrye-context-propagation, vertx] 2023-11-06 20:56:00,533 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-06 21:00:26,912 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-06 21:05:50,579 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-06 21:19:38,258 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-06 21:27:27,964 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-06 21:48:48,640 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-06 22:17:43,967 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-06 22:24:24,985 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-06 22:56:52,661 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-06 23:50:23,009 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-06 23:59:28,126 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-07 00:03:55,162 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 00:32:36,946 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 00:41:52,016 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 01:27:28,427 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-07 01:34:09,572 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-07 01:58:59,184 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-07 02:05:06,987 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-07 02:20:15,460 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 02:27:57,994 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 02:52:07,256 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 03:09:27,258 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 03:23:49,442 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-07 03:31:41,897 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-07 03:41:52,098 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-07 03:50:18,990 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-07 03:57:08,481 ERROR [io.fab.kub.cli.dsl.int.AbstractWatchManager] (CachedSingleThreadScheduler-1035363280-pool-6-thread-1) The last watch has yet to terminate as expected, will force start another watch. Please report this to the Fabric8 Kubernetes Client development team. 2023-11-07 04:05:20,560 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 04:10:46,900 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection timed out 2023-11-07 04:27:47,913 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 04:40:14,504 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 05:19:18,593 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-07 05:33:34,490 ERROR [io.fab.kub.cli.dsl.int.AbstractWatchManager] (CachedSingleThreadScheduler-1035363280-pool-6-thread-1) The last watch has yet to terminate as expected, will force start another watch. Please report this to the Fabric8 Kubernetes Client development team. 2023-11-07 05:47:12,500 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection timed out 2023-11-07 05:47:34,957 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 06:18:06,235 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 06:30:29,203 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-07 06:42:56,006 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 07:05:37,869 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 07:13:18,168 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 08:18:07,147 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-07 09:03:16,479 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 09:11:42,934 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 09:16:11,045 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-07 09:27:55,961 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 09:34:36,517 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 10:10:17,154 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-07 10:27:34,814 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 11:02:37,255 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-07 11:11:03,915 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-07 11:16:28,575 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 11:31:20,830 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-07 11:46:14,886 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 12:47:44,579 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-07 12:52:09,248 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 13:32:24,733 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 13:40:18,164 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 14:03:17,180 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 14:17:55,258 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-07 14:36:33,002 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 14:50:24,178 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 14:56:38,213 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 15:30:31,440 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) Connection reset 2023-11-07 15:38:47,184 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-07 15:43:11,607 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset 2023-11-07 16:01:04,323 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-3) Connection reset 2023-11-07 16:46:14,985 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-2) Connection reset 2023-11-07 16:55:37,269 ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-0) Connection reset ```

scholzj commented 1 year ago

Yeah, I think that makes sense - the resources removed the CPU bottleneck and the blocking operation now goes faster.

The connection resets are IMHO not related to CPU. That seems more like the connection to the Kube API is terminated. I think recreating the connections is not unusual for re-authentication etc. - I think that would happen with some randomness at < 10-minute intervals. But that should normally not trigger errors. So not sure why do you see them.

But I do not see these errors in my installations unfortunately, so no idea why they show up for you.

Aaron-ML commented 1 year ago

Yeah, I think that makes sense - the resources removed the CPU bottleneck and the blocking operation now goes faster.

The connection resets are IMHO not related to CPU. That seems more like the connection to the Kube API is terminated. I think recreating the connections is not unusual for re-authentication etc. - I think that would happen with some randomness at < 10-minute intervals. But that should normally not trigger errors. So not sure why do you see them.

But I do not see these errors in my installations unfortunately, so no idea why they show up for you.

Thanks for the insight, I'm going to close this issue and explore the API from the Azure side. I assume there's some difference in timings or proxy shenanigans with how they do things. Still works though which is all that matters!

strimzi / drain-cleaner

Blocked threads and other errors in logs #105

Things to note:

Log messages we are seeing: