redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.47k stars 579 forks source link

Group Coordinator not found errors #5279

Open patrickangeles opened 2 years ago

patrickangeles commented 2 years ago

Version & Environment

Redpanda version: 22.1

What went wrong?

On a fresh cluster, the _consumer_offsets internal topic does not exist. When the consumer starts using manual partition assignment (i.e., no call to subscribe), it triggers the creation of this topic, but also tries to interact with a group. Since the internal topic doesn’t exist it gets an unknown coordinator error.

How to reproduce the issue?

This was initially observed on librdkafka, but could also manifest with other clients.

  1. Instantiate new cluster (could be 1 node)
  2. Start up a consumer (the following example was tested, using manual partition assignment).

https://github.com/edenhill/librdkafka/blob/83bddc031ddd366208268e5cdd2cacb045f9fe2d/examples/rdkafka_complex_consumer_example.c

$ ~/src/librdkafka/examples/foo/consumer -b 127.0.0.1:39913 my_topic2:0
% Assigning 1 partitions
% Consumer error: Broker: Not coordinator: Failed to fetch committed offsets for 0 partition(s) in group "rdkafka_consumer_example": Broker: Not coordinator

JIRA Link: CORE-959

patrickangeles commented 2 years ago

Possible solution: pre-create the _consumer_offsets topic.

tmgstevens commented 2 years ago

+1. I've observed this using the Java client as well. I'm just not 100% sure under what circumstances the __consumer_offsets topic doesn't get created.

patrickangeles commented 2 years ago

I'm just not 100% sure under what circumstances the __consumer_offsets topic doesn't get created.

It doesn't get created by default, and gets auto created on first use

This wouldn't normally be big issue in long running environments, but not great in a CI/CD context where clusters are spun up/down constantly.

jcsp commented 2 years ago

It feels pretty reasonable to auto-create this one, given that we expect almost all users to be using consumer groups, although it does remove a window during which the user could otherwise have set their consumer_offsets partition count before the topic is created (if we create it eagerly, then they had to get this right before creating the cluster)

Pre-creating the topic doesn't in itself guarantee apps deterministically see a coordinator when they connect, if they connect in the first few seconds while the system is still bootstrapping itself. That would require also making a change to block requests until the topic is ready. That's probably reasonable too.