Failed to get offsets by times in 60000ms

cg-nz commented 5 years ago

Hey There

Thanks again for a great app, it really is superb for what you've done so far.

We seem to be starting to get a number of 504 timeouts caused by the below error;

2019-06-20 01:08:18,495 ERROR pGroup-1-2 o.k.c.ErrorController      java.lang.RuntimeException: Error for Describe Topics Offsets {}
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error for Describe Topics Offsets {}
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
        at org.kafkahq.utils.CompletablePaged.complete(CompletablePaged.java:68)
        at org.kafkahq.controllers.TopicController.list(TopicController.java:96)
        at org.kafkahq.controllers.$TopicControllerDefinition$$exec1.invokeInternal(Unknown Source)
        at io.micronaut.context.AbstractExecutableMethod.invoke(AbstractExecutableMethod.java:144)
        at io.micronaut.context.DefaultBeanContext$BeanExecutionHandle.invoke(DefaultBeanContext.java:2711)
        at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:295)
        at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:122)
        at io.micronaut.http.server.netty.RoutingInBoundHandler.lambda$buildResultEmitter$17(RoutingInBoundHandler.java:1360)
        at io.reactivex.internal.operators.flowable.FlowableCreate.subscribeActual(FlowableCreate.java:71)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.internal.operators.flowable.FlowableMap.subscribeActual(FlowableMap.java:37)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.internal.operators.flowable.FlowableSwitchIfEmpty.subscribeActual(FlowableSwitchIfEmpty.java:32)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.internal.operators.flowable.FlowableSwitchMap.subscribeActual(FlowableSwitchMap.java:49)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.reactivex.internal.operators.flowable.FlowableDefer.subscribeActual(FlowableDefer.java:42)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.reactivex.internal.operators.flowable.FlowableSwitchIfEmpty$SwitchIfEmptySubscriber.onComplete(FlowableSwitchIfEmpty.java:71)
        at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:52)
        at io.micronaut.http.context.ServerRequestContext.lambda$instrument$0(ServerRequestContext.java:68)
        at io.micronaut.reactive.rxjava2.InstrumentedSubscriber.onComplete(InstrumentedSubscriber.java:112)
        at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:426)
        at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
        at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onComplete(FlowableFlatMap.java:338)
        at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:52)
        at io.micronaut.http.context.ServerRequestContext.lambda$instrument$0(ServerRequestContext.java:68)
        at io.micronaut.reactive.rxjava2.InstrumentedSubscriber.onComplete(InstrumentedSubscriber.java:112)
        at io.reactivex.internal.operators.maybe.MaybeToFlowable$MaybeToFlowableSubscriber.onComplete(MaybeToFlowable.java:80)
        at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:52)
        at io.micronaut.http.context.ServerRequestContext.lambda$instrument$0(ServerRequestContext.java:68)
        at io.micronaut.reactive.rxjava2.RxInstrumentedMaybeObserver.onComplete(RxInstrumentedMaybeObserver.java:92)
        at io.reactivex.internal.operators.maybe.MaybeDoOnEvent$DoOnEventMaybeObserver.onComplete(MaybeDoOnEvent.java:115)
        at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:52)
        at io.micronaut.http.context.ServerRequestContext.lambda$instrument$0(ServerRequestContext.java:68)
        at io.micronaut.reactive.rxjava2.RxInstrumentedMaybeObserver.onComplete(RxInstrumentedMaybeObserver.java:92)
        at io.reactivex.internal.operators.flowable.FlowableElementAtMaybe$ElementAtSubscriber.onComplete(FlowableElementAtMaybe.java:102)
        at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:52)
        at io.micronaut.http.context.ServerRequestContext.lambda$instrument$0(ServerRequestContext.java:68)
        at io.micronaut.reactive.rxjava2.InstrumentedSubscriber.onComplete(InstrumentedSubscriber.java:112)
        at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:426)
        at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
        at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onComplete(FlowableFlatMap.java:338)
        at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:52)
        at io.micronaut.http.context.ServerRequestContext.lambda$instrument$0(ServerRequestContext.java:68)
        at io.micronaut.reactive.rxjava2.InstrumentedSubscriber.onComplete(InstrumentedSubscriber.java:112)
        at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:255)
        at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
        at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onSubscribe(FlowableFlatMap.java:117)
        at io.micronaut.reactive.rxjava2.InstrumentedSubscriber.onSubscribe(InstrumentedSubscriber.java:75)
        at io.reactivex.internal.operators.flowable.FlowableFromIterable.subscribe(FlowableFromIterable.java:69)
        at io.reactivex.internal.operators.flowable.FlowableFromIterable.subscribeActual(FlowableFromIterable.java:47)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.internal.operators.flowable.FlowableFlatMap.subscribeActual(FlowableFlatMap.java:53)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.internal.operators.flowable.FlowableElementAtMaybe.subscribeActual(FlowableElementAtMaybe.java:36)
        at io.reactivex.Maybe.subscribe(Maybe.java:4262)
        at io.micronaut.reactive.rxjava2.RxInstrumentedMaybe.subscribeActual(RxInstrumentedMaybe.java:64)
        at io.reactivex.Maybe.subscribe(Maybe.java:4262)
        at io.reactivex.internal.operators.maybe.MaybeDoOnEvent.subscribeActual(MaybeDoOnEvent.java:39)
        at io.reactivex.Maybe.subscribe(Maybe.java:4262)
        at io.micronaut.reactive.rxjava2.RxInstrumentedMaybe.subscribeActual(RxInstrumentedMaybe.java:64)
        at io.reactivex.Maybe.subscribe(Maybe.java:4262)
        at io.reactivex.internal.operators.maybe.MaybeToFlowable.subscribeActual(MaybeToFlowable.java:45)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.internal.operators.flowable.FlowableFlatMap.subscribeActual(FlowableFlatMap.java:53)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.internal.operators.flowable.FlowableSwitchIfEmpty.subscribeActual(FlowableSwitchIfEmpty.java:32)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14755)
        at io.micronaut.configuration.metrics.binder.web.WebMetricsPublisher.subscribe(WebMetricsPublisher.java:122)
        at io.reactivex.internal.operators.flowable.FlowableFromPublisher.subscribeActual(FlowableFromPublisher.java:29)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.internal.operators.flowable.FlowableSwitchMap.subscribeActual(FlowableSwitchMap.java:49)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.internal.operators.flowable.FlowableMap.subscribeActual(FlowableMap.java:37)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14755)
        at io.micronaut.http.context.ServerRequestTracingPublisher.lambda$subscribe$0(ServerRequestTracingPublisher.java:52)
        at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:52)
        at io.micronaut.http.context.ServerRequestTracingPublisher.subscribe(ServerRequestTracingPublisher.java:52)
        at io.reactivex.internal.operators.flowable.FlowableFromPublisher.subscribeActual(FlowableFromPublisher.java:29)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.micronaut.reactive.rxjava2.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:68)
        at io.reactivex.Flowable.subscribe(Flowable.java:14805)
        at io.reactivex.Flowable.subscribe(Flowable.java:14752)
        at io.reactivex.internal.operators.flowable.FlowableSubscribeOn$SubscribeOnSubscriber.run(FlowableSubscribeOn.java:82)
        at io.reactivex.internal.schedulers.ExecutorScheduler$ExecutorWorker$BooleanRunnable.run(ExecutorScheduler.java:288)
        at io.reactivex.internal.schedulers.ExecutorScheduler$ExecutorWorker.run(ExecutorScheduler.java:253)
        at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:79)
        at io.micrometer.core.instrument.Timer.lambda$wrap$0(Timer.java:143)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.RuntimeException: Error for Describe Topics Offsets {}
        at org.kafkahq.modules.KafkaModule.debug(KafkaModule.java:55)
        at org.kafkahq.modules.KafkaWrapper.describeTopicsOffsets(KafkaWrapper.java:94)
        at org.kafkahq.repositories.ConsumerGroupRepository.findByName(ConsumerGroupRepository.java:56)
        at org.kafkahq.repositories.ConsumerGroupRepository.list(ConsumerGroupRepository.java:37)
        at org.kafkahq.repositories.ConsumerGroupRepository.findByTopic(ConsumerGroupRepository.java:74)
        at org.kafkahq.repositories.TopicRepository.findByName(TopicRepository.java:101)
        at org.kafkahq.repositories.TopicRepository.findByName(TopicRepository.java:86)
        at org.kafkahq.repositories.TopicRepository.lambda$list$1(TopicRepository.java:63)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
        ... 1 common frames omitted
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 60001ms

I've tried changing some of the consumer properties for clients-defaults.consumer.properties.session.timeout.ms|heartbeat.interval.ms which showed some benefits.

This is running against docker image confluentinc/cp-schema-registry:5.2.2, which I've just tried updating from 5.0.1 we were using previously. The Kafka Clusters (multiple) are 3 brokers in each, running AMQStreams by RedHat, kafka version 2.0.0.

openjdk:11 for the base image, and adding the other files (.jar, kafkahq script etc) in manually.

We have about a dozen microservices that send/receive messages to/from this kafka from on and externally to OpenShift without timeouts etc.

Upping debugging doesn't seem to show anything really more in the logs, but I'm more than happy to provide info as needed.

I do note that it appears when more than one person is concurrently using it from different PC's that it starts to falter.

Application.yml

kafkahq:
  server:
    base-path: ""

  # default kafka properties for each clients
  clients-defaults:
    consumer:
      properties:
        isolation.level: read_committed
        session.timeout.ms: 10000
        heartbeat.interval.ms: 1000

  # list of kafka cluster available for kafkahq
  connections:
    dispatch:
      properties:
        bootstrap.servers: "dispatch-kafka-bootstrap:9092"
      schema-registry:
        url: "http://dispatch-schema-registry:8081"

  # Topic display data options
  topic-data:
    sort: NEWEST
    size: 50
    poll-timeout: 1000

  # Schama list display options
  schema:
    page-size: 25

  # Auth & Roles
  security:
    default-roles:
      - topic/read
      - topic/insert
      - topic/delete
      - topic/config/update
      - node/read
      - node/config/update
      - topic/data/read
      - topic/data/insert
      - topic/data/delete
      - group/read
      - group/delete
      - group/offsets/update
      - registry/read
      - registry/insert
      - registry/update
      - registry/delete
      - registry/version/delete

Thanks

cg-nz commented 5 years ago

Seems that inside 8 hours it ends up hitting about 2CPU cores constant usage and 6GB in memory then stops responding with;

unable to create native thread: possibly out of memory or process/resource limits reached

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

Container logs

[104.175s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
2019-06-24 04:53:42,463 ERROR pGroup-1-2 o.k.c.ErrorController      unable to create native thread: possibly out of memory or process/resource limits reached
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

From what I can see further in the DEBUG/TRACE logs is that it's constantly doing Describes on all the topics offsets, consumergroups over and over until it runs out of threads/ram etc.

Is there a private place you'd like the debug logs put?

Thanks

tchiotludo commented 5 years ago

Is this a constant behavior ? Seems to be really strange and related to https://github.com/tchiotludo/kafkahq/issues/75 I think about reduce the timeout, but it will not fixed the main issue. What is really strange is that don't really control thread, micronaut is doing it, and as I understand, the thread pool is limited !

If you want to send private log, give me a shot here : tchiot.ludo@gmail.com

cg-nz commented 5 years ago

This is constant, but only happening in one of our clusters, which is a testing one that is bigger than the others. Same amount of topics (140) and Schemas etc, but just more messages. Clusters have the same topology too. All our java microservice applications are still running fine on this cluster.

I'll fire the logs through now, thanks for that! Appreciate your time and effort :)

cg-nz commented 5 years ago

This is constant behavior in this particular cluster, but it has the same topology and number of topics as the others that are still working fine. The only difference is the number of messages within the cluster (many many more).

The java microservices in this cluster are still working fine with Kafka, just appears to be HQ that's having difficulty.

Actually seems nothing of great confidence in the logs so i'll attach here. Thanks very much indeed. kafkahq.log

tchiotludo commented 5 years ago

I have a quick look, there is a lot strange things in this log & feeling :

ConsumerGroup Offsets query is very long
Describe Topics Offsets is very long and time increase over time
There is a lot of duplicate query from kafkahq (for ex: ConsumerGroup Offsets [ConfigTest-KaS])
Seems that there is around 125 topics
I've the feeling that on your cluster every topics is consumed by a lot of consumer groups
Also have the feeling that there is concurrent request on kafkahq (more than 1) that make the logs a little messy

You can tell me more about topic in the cluster (number of partition especially) ? Also can you try to isolate only 1 query (and only one) on topic page on the log file ?

cg-nz commented 5 years ago

Thanks @tchiotludo , appreciate your time.

ConsumerGroup offsets seem to take a while, but usually on the first start of HQ it is great, and for a short while after -- but going into an hour of being up it starts to time out, especially if multiple users are using it.
Describe Topics Offsets; I noticed this too, and is a similar scenario to above CG Offsets.
- Duplicate queries; also saw this, many repeats of the same query
- 131 Topics to be precise :)
- There are 55 consumer groups on this cluster
- There are about 2-3 users concurrently in this cluster.

Partition counts vary, from 1-3 for all topics, except with 2 topics having 12 partitions. All are replica 3.

I've asked others to stop using the UI, did a full restart with trace logging and is attached. All the logs attached are with no one accessing it, not even myself. The Schema registry has several versions per schema registered for some topics.

After 4 minutes of starting, it was using just over 1GB of memory, and 1.5 CPU Cores. Thanks!

hq.log

cg-nz commented 5 years ago

Interestingly if I try out the Landoop kafka-topics-ui, this loads it all fine/quickly and keeps working.. but it's really not as lovely and featured as KafkaHQ, nor does it properly deserialise all avro as HQ does.

Could this likely be an issue with several versions of schemas being used for a topic, that causes both applications to have some difficulty?

Thanks!

tchiotludo commented 5 years ago

As I see on last log, avro is not the problem. It's not really easy to understand with a simple log, but seems that KafkaHQ is don't too much query on kafka. I've some kind on internal cache per http request that don't work anymore since I've introduced pagination. This is my first option that I will look at first.

Do you have used KafkaHQ before I add pagination on topic list ? And if yes, do you have the bug before ? This can be a good test to use version 0.7.2, to see if you have the issue

cg-nz commented 5 years ago

Thanks - actually seems to have been not an issue pre 0.8.0 -- i'll go back to 0.7.2 and see how that goes. Really appreciate your efforts here!

cg-nz commented 5 years ago

Okay yeah 0.7.2 works, and is actually really snappy and doesn't appear to have the same issue - i'll get back to our users to get them to try it for the next 24 hours and let you know.

Thanks @tchiotludo !

tchiotludo commented 5 years ago

yeah :+1: My optimization is not an optimization so :cry: And the pagination is worst than before !

I'll try to reproduce on my side, but I have a clear view on the reason now ! Thanks for your time on this issue !

tchiotludo commented 5 years ago

@cg-nz Can you try with dev version docker pull tchiotludo/kafkahq:dev ? I try to make a fix that will avoid duplicate call on kafka api with the same query (consumer group, offset, ...). Also reduced timeout on api.

Is this work better with your cluster ?

cg-nz commented 5 years ago

Hey @tchiotludo

Thanks for your time and replying.

Unfortunately the issue returns when using the tchiotludo/kafkahq:dev image. In fact it returns a 504 time out, then 500 and this appears straight away;

"unable to create native thread: possibly out of memory or process/resource limits reached"

2019-07-01 08:56:13,273 ERROR pGroup-1-2 o.k.c.ErrorController      unable to create native thread: possibly out of memory or process/resource limits reached
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
        at java.base/java.lang.Thread.start0(Native Method)
        at java.base/java.lang.Thread.start(Thread.java:803)

Once I revert back to the 0.7.2 and bounce the container, it's straight back to being very snappy, responsive and such.

Thanks v.much.

Regards cg

tchiotludo commented 5 years ago

Do you have limit on containrer ? (cpu / mem / ...) ? If yes, can you share the config please ?

cg-nz commented 5 years ago

Sorry forgot to mention - there are no imposed limits from OpenShift. We have other workloads running at 2vCPU/10GB ram no problems. It appears KafkaHQ on this occasion only hit 0.3vCPU and 1GB Memory and the error occurred on the dev image.

Right now back on 0.7.2 it's using a little less than that, but is absolutely flying along nicely.

Perhaps a 0.8.0 image but without the pagination, or am I the only person observing this currently?

As it is, 0.7.2 is brilliant and is a huge help. Thank you.

RocknRolla commented 5 years ago

@cg-nz have you tried to remove your zookeeper container and then docker-compose up?

cg-nz commented 5 years ago

@parisian Cheers, but that doesn't really apply to our setup using Kubernetes/OpenShift AMQ Streams with strimzi operator. Thanks for suggesting though!

Thanks

tchiotludo commented 5 years ago

@cg-nz can you resend me a log please ? Lacking of idea here for now ... :cry:

tchiotludo commented 5 years ago

@cg-nz

Just got an idea digging the web : 2 options :

Seems to say that you've reach a user limit of number of process (thread) on your node. You try to add more ?

Another options (but as I can read the JVM message is misleading and it's not a memory trouble, so must not be the solution) : can you try to tune the JVM options to see if it works ? Just add env variable JAVA_OPTS='-Xmx2g -Xms2g' must be working to raise memory usage

Thanks

cg-nz commented 5 years ago

I've tried using -Xms512m -Xmx4096m with the same issue. We're not out of threads on the nodes per process as there are dozens of other containers running with more threads than this. Could it be a micronaut imposed limit?

tchiotludo commented 5 years ago

As I know, micronaut don't enforced this, will dig it to be sure.

There is a prometheus endpoint on kafkahq /prometheus, can you send me the output ? Especially the process_files_open_files 200.0 & process_files_max_files 1048576.0 The full output will be nice since there is also some information about thread on executor & jvm_threads_live_threads for example that can help

Thanks

tchiotludo commented 5 years ago

closing in favor of #137

tchiotludo / akhq

Failed to get offsets by times in 60000ms #83