linkedin / Burrow

Kafka Consumer Lag Checking
Apache License 2.0
3.72k stars 796 forks source link

Burrow not picking up some consumer groups #377

Open skarjoko opened 6 years ago

skarjoko commented 6 years ago

We have Kafka 1.0.0 that has been running for a few months now. We've upgrading this cluster from 0.10, to 0.11, and now to 1.0.0 in January.

When running burrow off of master, we see a list of consumer groups through the API for most of our consumers, except for a few (lets call this group: foo).

This group is shown when we run the kafka-consumer-groups admin tool to list the consumer groups, and its present in the __consumer_offsets topic. It is still not available through the burrow API.

The consumer application is built off of Kafka 1.0.0.

We are also running an older version of burrow, from September 2017, prior to the 1.0.0 release. This version of burrow DOES pickup the foo consumer group. Only when we move to a newer version of burrow, which using the .toml file, do we miss the foo consumer group.

Here is our config (with stubbed out hostnames):

pidfile="burrow.pid"
stdout-logfile="burrow.out"
access-control-allow-origin="mysite.example.com"

[logging]
filename="logs/burrow.log"
level="info"
maxsize=100
maxbackups=30
maxage=10
use-localtime=false
use-compression=true

[zookeeper]
servers=[ "zookeeper:2181" ]
root-path="/burrowtest"
timeout=6

[client-profile.test]
client-id="burrow-stg"
kafka-version="1.0.0"

[cluster.test-stg]
class-name="kafka"
servers=[ "stg-broker:9092" ]
client-profile="test"
topic-refresh=30
offset-refresh=5

[consumer.test-stg]
class-name="kafka"
cluster="test-stg"
servers=[ "stg-broker:9092" ]
offsets-topic="__consumer_offsets"

[cluster.test-prd]
class-name="kafka"
servers=[ "prod-broker:9092" ]
client-profile="test"
topic-refresh=30
offset-refresh=5

[consumer.test-prd]
class-name="kafka"
cluster="test-prd"
servers=[ "prod-broker:9092" ]
offsets-topic="__consumer_offsets"

[httpserver.default]
address=":3000"
toddpalino commented 6 years ago

Can you run Burrow with debug logging enabled (either via config or using the HTTP call to change the log level) and look through the output for the group name you're having a problem with. That will give you more info on why the offsets are being dropped.