linkedin / Burrow

Kafka Consumer Lag Checking
Apache License 2.0
3.73k stars 797 forks source link

Intermittent jumps/increases in consumer lag #431

Open smlgbl opened 6 years ago

smlgbl commented 6 years ago

Hi,

we're using the latest master build (at the time of this writing: https://github.com/linkedin/Burrow/commit/12e681a3a8a61f84f17677996dc3e6a2b79fac41) Our Kafka-Brokers are running 1.1.0 We switched recently from https://github.com/Morningstar/kafka-offset-monitor to Burrow, because we're adding authorization to our Clusters.

Now, most of our consumer-lags are 0 most of the time (according to Burrow, whereas on kafka-offset-monitor they were around 1K - 100K most of the time - both are OK from our point of view). For reasons unknown to us, the consumer lag "jumps" e.g. from 0 to 1.4 Billion(!) from one minute to the next, and back again after another minute. We have about 20 consumers on our main topic, and all of their lags jump - but by different amounts. Some "only" jump from 1k to 1M, others from 0 to the billions described above.

Is anybody else seeing this? Is there a known reason or do we have to adjust our config? - We didn't change anything about the default config for the evaluation or notifications...

We use https://github.com/rgannu/burrow-graphite to report to graphite, and our alarming system is based on those metrics...

Any help is appreciated

smlgbl commented 6 years ago

This should have been a stackoverflow question...

https://stackoverflow.com/questions/51534532/kafka-consumer-lag-monitoring-with-linkedid-burrow-jumps-intermittently