yahoo / CMAK

CMAK is a tool for managing Apache Kafka clusters
Apache License 2.0
11.84k stars 2.5k forks source link

Multiple clusters configuration cause incorrect computation of consumer lags #819

Open dchubarov opened 4 years ago

dchubarov commented 4 years ago

I have two clusters added to Kafka-Manager (3.0.0.5). The first one is a big production cluster running rather old Kafka 1.0.0, the second is experimental and currently disabled in CMAK. That state is presented on the screen shot:

clusters

When I'm in consumer group monitoring view I can see that consumers actually eat messages from topics (total lag counters get changed as I refresh the page displaying positive or negative values around zero - that's OK):

consumer-group-1

Now I enable the second cluster named "vesta" in the cluster view and go back to the same consumer group monitoring page in the first cluster. Now the total lag counters do not decrease anymore and show growing number of unprocessed messages although the consumers do work as I can see from applications metrics!

consumer-group-2

Okay, fortunately kafka-manager does not affect real consumers, so if I disable the second cluster again, the total lag gets to normal almost instantly. So I conclude there seems to be a bug affecting consumer group offset calculation when multiple clusters are configured.

Cluster configurations have only "Poll consumer information" and "Enable Active OffsetCache" checkboxes turned on (JMX polling is off).

zlorb commented 3 years ago

Seeing the same issue with two clusters running v2.4.0, and work-around behaves the same as well. Seems to be a regression in 3.0.0.5. Was working fine with 3.0.0.4.