Open arushi315 opened 3 months ago
Looks like the issue is intermittent because I am not able to reproduce this when I am upgrading kafka. During the upgrade metric does stop for a few but once upgrad has completed, it starts showing up again without having to restart burrow.
For the kafka cluster where we originally noticed the cluster, we have 9 brokers and observed EOF with all 9 brokers:
{"level":"error","ts":1720926917.8395112,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":1}
{"level":"error","ts":1720927037.8437417,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":2}
{"level":"error","ts":1720927147.84239,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":3}
{"level":"error","ts":1720927257.8391266,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":4}
{"level":"error","ts":1720927377.8412945,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":5}
{"level":"error","ts":1720927507.8451424,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":6}
{"level":"error","ts":1720927697.84136,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":7}
{"level":"error","ts":1720927887.8561947,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":8}
{"level":"error","ts":1720928077.8389344,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":9}
....
{"level":"error","ts":1720928077.8438833,"msg":"failed to get the list of available consumer groups","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","error":"dial tcp 10.104.7.186:9092: connect: connection refused"}
Burrow Version: 1.8.0
Issue: After upgrading Kafka from version 3.6.x to 3.7.x, we observed that the Burrow service stopped emitting the consumer lag metric. Restarting the Burrow service temporarily resolved the issue.
Logs: The following warnings and errors were observed in the Burrow logs:
The Kafka upgrade was performed in a rolling fashion, one broker at a time. While communication disruptions were expected with the upgrading broker, others should have been available.
Burrow Configuration: Here is the configuration we are using:
Note: The kafka-version is set to 3.6.1, but as mentioned earlier, Burrow works fine with Kafka 3.7.x after a restart, so this does not seem to be the root cause.
Request:
Please let me know if additional information is required. Thank you!