tchiotludo / akhq

Kafka GUI for Apache Kafka to manage topics, topics data, consumers group, schema registry, connect and more...
https://akhq.io/
Apache License 2.0
3.3k stars 638 forks source link

Error: UnknownTopicOrPartitionException / Kafka don't update consumers groups if a topic is deleted #216

Open danielpetisme opened 4 years ago

danielpetisme commented 4 years ago

Note: I thought the following troubleshooting worth to be shared. Hope it help more people! Thank you for your work its awesome!

Context I had KafkaHQ running pretty well, I had a cluster issue which required some not-recommended actions (like removing topics directly from the FS) after the restart, KafkaHQ was not working anymore with the following error:

org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.

java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.
    at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
    at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
    at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
    at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
    at org.kafkahq.modules.AbstractKafkaWrapper.describeTopics(AbstractKafkaWrapper.java:78)
    at org.kafkahq.modules.AbstractKafkaWrapper.describeTopicsOffsets(AbstractKafkaWrapper.java:133)
    at org.kafkahq.modules.$KafkaWrapperRequestScopeDefinition$$exec6.invokeInternal(Unknown Source)
    at io.micronaut.context.AbstractExecutableMethod.invoke(AbstractExecutableMethod.java:146)
    at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:60)
    at org.kafkahq.modules.$KafkaWrapperRequestScopeDefinition$Intercepted.describeTopicsOffsets(Unknown Source)
    at org.kafkahq.repositories.ConsumerGroupRepository.findByName(ConsumerGroupRepository.java:74)
    at org.kafkahq.repositories.ConsumerGroupRepository.findByTopic(ConsumerGroupRepository.java:90)
    at org.kafkahq.repositories.TopicRepository.findByName(TopicRepository.java:120)
    at org.kafkahq.repositories.TopicRepository.lambda$list$0(TopicRepository.java:64)
    at org.kafkahq.repositories.TopicRepository$$Lambda$750/0000000000000000.apply(Unknown Source)
    at org.kafkahq.utils.PagedList.of(PagedList.java:70)
    at org.kafkahq.repositories.TopicRepository.list(TopicRepository.java:64)
    at org.kafkahq.controllers.TopicController.list(TopicController.java:101)
    at org.kafkahq.controllers.$TopicControllerDefinition$$exec7.invokeInternal(Unknown Source)
    at io.micronaut.context.AbstractExecutableMethod.invoke(AbstractExecutableMethod.java:146)

Troubleshoot I started to instrument the code to try to understand what was happening. First, just as the stack strace indicates, I looked at the topic description https://github.com/tchiotludo/kafkahq/blob/dev/src/main/java/org/kafkahq/modules/AbstractKafkaWrapper.java#L65

I replace the stream by a dumb loop just to be able to invoke describeTopics for each topic in the collection in order to find the problematic topic.

Once I found the guilty topic I tried to have more details directly on the cluster with kafka-topics --zookeeper ZK --describe TOPIC but it wasn't listed....

I took a step back to try to find how this missing topic could have be included place to the AbstractKafkaWrapper#describeTopics arguments in the first.

After some researchs, I found the method was used here https://github.com/tchiotludo/kafkahq/blob/dev/src/main/java/org/kafkahq/repositories/ConsumerGroupRepository.java#L74

Which means the topic has been provided because a ConsumerGroup has an active subscription to it. Back to the cluster, I run the command kafka-consumer-groups --bootstrap-server BROKERS --describe --all-groups and one consumer group was indeed generating the same error

[2020-02-08 23:10:09,788] WARN [Consumer clientId=consumer-2, groupId=MY_GROUP_ID] Error while fetching metadata with correlation id 49 : {THE_MISSING_TOPIC=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient)

Fix In my case the solution was easy, the consumer group was not used anymore so I decided to simply delete it with the command kafka-consumer-groups --bootstrap-server BROKERS --delete --group MY_GROUP_ID

tchiotludo commented 4 years ago

Hello @danielpetisme

Thanks for sharing this. In a future version, I will handle it on KafkaHQ in order to not crash on this case. I already have report like just need a big refactor to handle it correctly !

danielpetisme commented 4 years ago

Great !

I propose to close the issue

tchiotludo commented 4 years ago

Reopen it to make HQ more resilient to this