tulios / kafkajs

A modern Apache Kafka client for node.js
https://kafka.js.org
MIT License
3.7k stars 521 forks source link

Failed to find group coordinator #1630

Open dasaripravin-developer opened 11 months ago

dasaripravin-developer commented 11 months ago

I have 3 node consumer group and facing the failed to find group coordinator ERROR. And consumer get stopped. Can anyone assist me to figure out the this issue. Find the below details for the same.

Error message which is got in console {"level":"ERROR","timestamp":"2023-10-06T12:00:50.029Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSGroupCoordinatorNotFound: Failed to find group coordinator","groupId":"NODEKAFKA","stack":"KafkaJSGroupCoordinatorNotFound: Failed to find group coordinator\n at Cluster.findGroupCoordinatorMetadata"}

Expected behavior The group member should get the group coordinator.

Observed behavior Getting failed to find group coordinator and stopped the consumer

Environment:

Additional context Application running in kubernetes pod.

Please let me know if need more details.

ghost commented 10 months ago

I've seen this happen in new clusters when the offsets.topic.replication.factor (default: 3) is less than the actual number of brokers.

arupsarkar-sfdc commented 8 months ago

I am getting this error when starting my consumer, producer is working fine and publishing messages. How can I consumer messages. Can someone please help me in resolving it.

FYI: My topic details of replication and partition is mentioned below Replication: 3 Partition: 2

Error Msg 1 {"level":"ERROR","timestamp":"2023-12-20T19:16:12.330Z","logger":"kafkajs","message":"[Connection] Response GroupCoordinator(key: 10, version: 2)","broker":"ec2-54-90-120-75.compute-1.amazonaws.com:9096","clientId":"my-app","error":"Not authorized to access group: Group authorization failed","correlationId":0,"size":49} Error Msg 2

{"level":"ERROR","timestamp":"2023-12-20T19:16:12.330Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSGroupCoordinatorNotFound: Failed to find group coordinator","groupId":"my-app","stack":"KafkaJSGroupCoordinatorNotFound: Failed to find group coordinator
    at Cluster.findGroupCoordinatorMetadata (/app/node_modules/kafkajs/src/cluster/index.js:420:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /app/node_modules/kafkajs/src/cluster/index.js:346:33
    at async [private:ConsumerGroup:join] (/app/node_modules/kafkajs/src/consumer/consumerGroup.js:167:24)
    at async /app/node_modules/kafkajs/src/consumer/consumerGroup.js:335:9
    at async Runner.start (/app/node_modules/kafkajs/src/consumer/runner.js:84:7)
    at async start (/app/node_modules/kafkajs/src/consumer/index.js:243:7)
    at async Object.run (/app/node_modules/kafkajs/src/consumer/index.js:304:5)
    at async Object.startConsumer (/app/server/kafka-server.js:47:9)
    at async /app/server/app.js:477:5"}
FUNJABIVRUSH commented 3 months ago

@arupsarkar-sfdc did you find any solution ?

florian-besser commented 2 months ago

This issue also happened to me; worst part was that the consumer did not recover. Normally KafkaJS will just retry whatever failed and eventually get back to healthy again, but in this case I had to restart the entire app.

florian-besser commented 2 months ago

I can add some more details, the issue reappeared. Here are our logs:

Jul 3, 2024 @ 14:16:37.355
NS: Connection, label: ERROR, message: Connection error: getaddrinfo ENOTFOUND kafka-controller-1.kafka-controller-headless.kafka.svc.cluster.local

This is a correct error message; we indeed had a network wobble in our Kafka cluster. KafkaJS detected this correctly, and followed:

Jul 3, 2024 @ 14:16:37.367
NS: Consumer, label: ERROR, message: Crash: KafkaJSNumberOfRetriesExceeded: Connection error: getaddrinfo ENOTFOUND kafka-controller-1.kafka-controller-headless.kafka.svc.cluster.local

So far so good, KafkaJS tries to recover:

Jul 3, 2024 @ 14:16:37.374
NS: Consumer, label: ERROR, message: Restarting the consumer in 10942ms
Jul 3, 2024 @ 14:16:48.322
NS: Consumer, label: INFO, message: Starting

So we're seeing a fresh connection attempt, this is as expected. Unfortunately the cluster was still down, so we got:

Jul 3, 2024 @ 14:17:11.120
NS: Consumer, label: ERROR, message: Crash: KafkaJSGroupCoordinatorNotFound: Failed to find group coordinator

Expected outcome:

The consumer stops once more, tries again after a few seconds

Actual outcome:

The consumer stops after KafkaJSGroupCoordinatorNotFound and does not recover.

From then on the app ran without a consumer (but was reporting healthy) and had to be restarted manually - after a restart KafkaJS was working correctly once more

belchior commented 2 weeks ago

I received this error in a local environment a I fixed setting the IP address of the docker in the docker-compose file.

environment:
  KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:9092,PLAINTEXT_HOST://DOCKER_IP:29092