handle case when Broker: Leader not available

omnipresent07 commented 4 years ago

I'm using the latest version of kafkahq. While displaying the topics, we're getting the following error:

Caused by: java.lang.RuntimeException: Error for Describe Topics Offsets {}
    at org.kafkahq.utils.Lock.call(Lock.java:29)
    at org.kafkahq.modules.KafkaWrapper.describeTopicsOffsets(KafkaWrapper.java:96)
    at org.kafkahq.repositories.TopicRepository.findByName(TopicRepository.java:100)
    at org.kafkahq.repositories.TopicRepository.findByName(TopicRepository.java:91)
    at org.kafkahq.repositories.TopicRepository.lambda$list$0(TopicRepository.java:52)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
    ... 1 more

Entire stacktrace is here: https://gist.github.com/omnipresent07/db223be60a831b6e359bfbca13fb5688

My application.conf file looks like this:

kafkahq:
  connections:
    somename:
      properties:
        bootstrap.servers: "someip"

tchiotludo commented 4 years ago

To avoid this error, just raise the configuration kafkahq.clients-defaults.consumer.properties.default.api.timeout.ms like this :

kafkahq:
  connections:
    somename:
      properties:
        bootstrap.servers: "someip"
  clients-defaults:
    consumer:
      properties:
        default.api.timeout.ms: 60000

It seems that your cluster is too long to respond on give you the topic offset.

The next step will allow you to have a working kafkahq page, but I think the app will be slow (#55). Tell me how it work !

relate to : https://github.com/tchiotludo/kafkahq/issues/94 https://github.com/tchiotludo/kafkahq/issues/83

omnipresent07 commented 4 years ago

@tchiotludo I'm now getting the same error but just with different time:

Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 60002ms

I can view the /groups but I can't view the /topic. I even tried to bump up the 6000 to 60000 but get the same error after 1 minute of delay.

Could this be caused by unbalance in our kafka cluster?

tchiotludo commented 4 years ago

Your cluster is unbalanced ? If yes, I've hear that KafkaHQ don't work in this case.

It will need some dev to handle this case

omnipresent07 commented 4 years ago

We're not sure whether the cluster is unbalanced. Is there a quick way we can validate that via the command line tools available in bin/ or kafkahq? We can view the /nodes and /group using kafkahq just not /topics

tchiotludo commented 4 years ago

Maybe with kafkacat -L or bin/kafka-topics may help ? By there is a chance that you have unbalanced cluster, the Kafka api wait for topic metadata in this case as I know

omnipresent07 commented 4 years ago

Ok, I'll try kafkacat. btw, we can get to the Live Tail page as well which shows list of all topics. Since /topcis shows list of topics and metadata maybe that is the issue. Will report back soon

omnipresent07 commented 4 years ago

I can get a list of topics via kafkacat. Would it be ok if I email you the output?

tchiotludo commented 4 years ago

As we see by email, the cluster is unbalanced. This case will need a lot of async on get the topic offset in order to handle this case. Also this can help a lot with other issues : #83 & #55

pmpetit commented 4 years ago

Hello, i can see this issue closed, but i do not find any solution. Did i miss something ? Thanks (i have the same issue ...Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 120000ms)

tchiotludo commented 4 years ago

No the issue is not closed :smile: This issue will need a big refactor as I explain here : https://github.com/tchiotludo/kafkahq/pull/134#issuecomment-548974495

We fetch some topic data on list page and if the cluster is unbalanced, it will crash the app. I need to extract all call that crash app (typically offset of the topic) to async to avoid this issue

But it's a very long work and I need to find many hours to do it

apellegr06 commented 4 years ago

Hi,

Have you plan to do this refactoring to correct this use case ? Because it's really annoying

Thanks Regards

tchiotludo commented 4 years ago

Big refactor like I said, it's going but don't expected this before few month. Opensource is second or third jobs :smile:

s7an-it commented 4 years ago

On 0.14.1 I set the 60000 ons config and secret and on most of the calls I get infinite loop after when I click on topic, on restart of broker with leader partition it gets fixed.

rkettelerij commented 4 years ago

This issue happens when we open AKHQ during a rolling upgrade of Kafka.

alwibrm commented 3 years ago

We're seeing this occasionally on 0.15.0.

tchiotludo commented 3 years ago

@alwibrm for information, we don't support any else than the last version (opensource not staffed to support multiple version). Please update to last version when doing report if you want to have feedback :)

xakassi commented 3 years ago

Hi, @tchiotludo ! I have steps to reproduce an issue with Error for Describe Topics Offsets. I use docker-compose with 1 Zk instance and 3 Kafka instances.

1) At first only 1 Kafka instance is started (for other instances docker stop is called). Create via AKHQ UI topic test-1 (1 partition, 1 RF), topic test-2 (2 partitions, 1 RF). 2) Start the 2nd Kafka instance (docker start). Create via AKHQ UI topic test-3 (3 partitions, 2 RF), topic test-4 (4 partitions, 2 RF).

The error Error for Describe Topics Offsets is appeared for both test-3 and test-4, but they are created!

Sure the error Error for Describe Topics Offsets is appeared also in case when you stop some brokers. But it's strangely appears also for steps above! And everything continue to work after that.

danielhass commented 9 months ago

Hey @tchiotludo - I think we are running into the same issue. See the following error message when one broker is not available in the cluster:

DescribeTopicOffsetError

However I don't know if the issue title and description really covers our case. This issue says "[...] Leader not available". However all off our topics have replicas and a leader in case one of our brokers goes down.

However we noticed the following error with the Kafka CLI two which we can't really explain:

./kafka-consumer-groups --bootstrap-server <bootstrap-server-list> --all-groups --describe
[2024-01-04 07:41:32,504] WARN [AdminClient clientId=adminclient-1] Connection to node -1 (<node-hostname>:9093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

Error: Executing consumer group command failed due to org.apache.kafka.common.errors.TimeoutException: Call(callName=metadata, deadlineMs=1704350498106, tries=51, nextAllowedTryMs=1704350498207) timed out at 1704350498107 after 51 attempt(s)
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=metadata, deadlineMs=1704350498106, tries=51, nextAllowedTryMs=1704350498207) timed out at 1704350498107 after 51 attempt(s)
        at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
        at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
        at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
        at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
        at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.getLogEndOffsets(ConsumerGroupCommand.scala:646)
        at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.collectConsumerAssignment(ConsumerGroupCommand.scala:412)
        at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$collectGroupsOffsets$9(ConsumerGroupCommand.scala:593)
        at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:117)
        at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:104)
        at scala.collection.mutable.HashMap.flatMap(HashMap.scala:35)
        at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$collectGroupsOffsets$2(ConsumerGroupCommand.scala:585)
        at scala.collection.Iterator$$anon$9.next(Iterator.scala:575)
        at scala.collection.mutable.Growable.addAll(Growable.scala:62)
        at scala.collection.mutable.Growable.addAll$(Growable.scala:57)
        at scala.collection.mutable.HashMap.addAll(HashMap.scala:117)
        at scala.collection.mutable.HashMap$.from(HashMap.scala:589)
        at scala.collection.mutable.HashMap$.from(HashMap.scala:582)
        at scala.collection.MapOps$WithFilter.map(Map.scala:348)
        at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.collectGroupsOffsets(ConsumerGroupCommand.scala:567)
        at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.describeGroups(ConsumerGroupCommand.scala:368)
        at kafka.admin.ConsumerGroupCommand$.run(ConsumerGroupCommand.scala:73)
        at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:60)
        at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=metadata, deadlineMs=1704350498106, tries=51, nextAllowedTryMs=1704350498207) timed out at 1704350498107 after 51 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: metadata

Is this problem covered by this issue? Or shall I open a new one?

One thing I would like to mention is that we are still able to view the messages of a specific topic if we craft the URL ourselves:

TopicViewWorking

tchiotludo / akhq

handle case when Broker: Leader not available #137