tchiotludo / akhq

Kafka GUI for Apache Kafka to manage topics, topics data, consumers group, schema registry, connect and more...
https://akhq.io/
Apache License 2.0
3.41k stars 659 forks source link

Unable to display the consumer groups of a topic #1863

Open thibthibus opened 3 months ago

thibthibus commented 3 months ago

Hi,

We used to be able to display the consumer groups existing on a given topic (by clicking on the "Consumer Groups" tab) but it's not working anymore since we migrated our Kafka clusters from Strimzi/OpenShift to Confluent Cloud.

The UI display an "Internal Error" pop-up. image

Checking the logs shows that these requests last quite a long time but seem to work since they return a 200. image

No idea why there are 2 requests from 2 different threads though.

I tried to set request.timeout.ms to a higher value (1 min then 2 min) but this didn't help.

Any idea what is causing this issue ?

Thanks

AlexisSouquiere commented 3 months ago

Is it the same when you click on "Consumer groups" from the left menu ? I think that I noticed something wrong on the Consumer groups tab from the topic screen but I didn't have time to investigate yet.

thibthibus commented 3 months ago

Is it the same when you click on "Consumer groups" from the left menu ? I think that I noticed something wrong on the Consumer groups tab from the topic screen but I didn't have time to investigate yet.

Hi @AlexisSouquiere Yes sorry I forgot to mention that it's working fine when clicking on "Consumer Groups" from the left menu. But I guess here the logic is different because it just lists all the consumer groups available in the cluster, whereas the tab in topic only lists the consumer groups that reads this topic...

AlexisSouquiere commented 3 months ago

Ok then I think that it's the same issue that I saw few times ago. I think that the issue comes from the amount of consumer groups and the way AKHQ currently tries to filter on the consumer groups for the topic

thibthibus commented 3 months ago

Ok then I think that it's the same issue that I saw few times ago. I think that the issue comes from the amount of consumer groups and the way AKHQ currently tries to filter on the consumer groups for the topic

Yes that's my understanding too. I guess that with Confluent Cloud (or just because the number of CGs on our cluster increased significantly last year) this filtering operation is taking too much time hence reaching some timeouts (but I don't understand on which side, AKHQ or Kafka...)

AlexisSouquiere commented 3 months ago

It's on AKHQ side for me, precisely here:

https://github.com/tchiotludo/akhq/blob/cf173a61e241ff6dd6c9bd14819c7487e216fdf6/src/main/java/org/akhq/repositories/ConsumerGroupRepository.java#L106-L108

First it retrieves all the consumer groups name, then it does a request for each consumer group to get the details. And only after it applies a filter on the active topics. If you have 10k consumer groups, it will do 1 + 10K requests, which is obviously causing the issue.

I didn't check in details yet (I don't know if it will be possible to improve it) but at least we know that the issue comes from here !

thibthibus commented 3 months ago

Hi @AlexisSouquiere Yes it's more or less our situation as we have lots of consumer groups. We deploy AKHQ for each Kafka namespace (basically a prefix for Kafka topics and consumer groups) so the Kafka user which is configured in AKHQ has limited permissions (ACLs) on the Kafka cluster. However we always had to give a ALLOW DESCRIBE * LITERAL ConsumerGroup ACL to these AKHQ users, otherwise this does not work at all (HTTP 409). Is this an expected behavior (and a requirement for the ACLs of AKHQ users) or do you think it's a bug ? I was just thinking about that again, because if we could restrict the scope of consumer groups accessed, then maybe this feature could work since a limited number of consumer groups would be considered...

AlexisSouquiere commented 3 months ago

You can still restrict the scope of consumer groups with AKHQ and the new RBAC. In my company we decided not to restrict the access on consumer groups because it would be too complicated to handle access on consumer groups managed by another team within the company.

I have completed my PR to fix the issue. The problem with the current implementation was the empty groups that we were trying to load. With the new option HIDE_EMPTY we will be able to see only the active consumer groups (with active members) ans skip the empty ones. Doing this I'm able to show the consumer groups quickly for my topic (I'm not able with the current version)