redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.65k stars 589 forks source link

AdminClient/listConsumerGroups TimeoutException when Broker Configuration Set #1865

Closed d-t-w closed 2 years ago

d-t-w commented 3 years ago

Redpanda Version: v21.7.4 (current latest)

Background: encountered when running kPow with a Redpanda cluster with modified broker configuration.


Report:

AdminClient/listConsumerGroup throws a TimeoutException

Error listing groups on localhost:9092 (id: 0 rack: null): Call(callName=listConsumerGroups, deadlineMs=1626870066072, tries=570, nextAllowedTryMs=1626870066173) timed out at 1626870066073 after 570 attempt(s)

When:

The following broker configuration is set:

  default_topic_partitions: 18
  default_topic_replications: 3

And:

At least one ConsumerGroup is connected to the cluster.

Reproducer

See kpow/redpanda-reproducer-1 for Docker Compose configuration and Clojure reproducer.

emaxerrno commented 3 years ago

Thank you @d-t-w ! I assume 21.6.6 didn’t have this issue ? We’ll fix.

d-t-w commented 3 years ago

I'm not 100% sure when it popped up @senior7515 - it's a bit of an odd one. In the linked repro it's pretty easy to bump the rp version back to any point.

I should add this was picked up very recently by a user who I believe is relatively new to redpanda, so I don't have much of a grip on when it started to occur.

esteban commented 3 years ago

@NyaliaLui is there something pending to do on https://github.com/vectorizedio/redpanda/pull/1966?

NyaliaLui commented 3 years ago

@esteban Yes, after a conversation with Noah, we determined that the issue is deeper than simply changing a boolean value. I'm investigating now.

he-la commented 2 years ago

Hitting what appears to be this exact issue on v22.1.6 (rev ea9c411). Any updates on a probable cause or fix?

NyaliaLui commented 2 years ago

Hitting what appears to be this exact issue on v22.1.6 (rev ea9c411). Any updates on a probable cause or fix?

Hi @he-la thanks for raising this. The original fix was in https://github.com/redpanda-data/redpanda/pull/2210 looks like we forgot to flag it.

Could you share your config file and the steps that led to the failure? Thank you.

jrkinley commented 2 years ago

Hi @he-la, I'm not sure if this is relevant for listConsumerGroups but please check the partition count for the __consumer_offsets topic. If it has a single partition, then try to increase the partition count using rpk topic add-partitions: https://docs.redpanda.com/docs/reference/rpk-commands/#rpk-topic-add-partitions.

he-la commented 2 years ago

Thanks @jrkinley - I can no longer reproduce the issue, not sure if this is due to added partitions or something else though.

NyaliaLui commented 2 years ago

Closing since the problem was resolved. @he-la Feel free to re-open if you run into the problem again.