redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.38k stars 577 forks source link

Consumer Group metrics #7392

Open jason-da-redpanda opened 1 year ago

jason-da-redpanda commented 1 year ago

Who is this for and what problem do they have today?

We do not really have good metrics for Consumer Group basics ... e.g lag, rebalancing/rejoin, heatbeats, latency

The kind of thing we are typically interested in knowing is

This is for Redpanda admins/Support trying to troubleshoot Consumer Group issues

What are the success criteria?

metrics exposed for things like : join-rate* , heartbeats, lag , latencies

Why is solving this problem impactful?

Because it helps us troubleshoot issues with CG currently ... if we want to see some of this stuff , for example "Handling join request/PreparingRebalance" we nave to turn on TRACE,,, and for "kafka" which is very noisy.

Additionally with metrics.. customers can have alerts defined for things such as CG's having high amount of rebalancing

Additional notes

for inspo: Consumer Group Metric

JIRA Link: CORE-1092

jcsp commented 1 year ago

Can you please list out the metrics you would like to see?

jason-da-redpanda commented 1 year ago

hi @jcsp I updated the description to have bit more detail on the kind of things we are looking for ..albeit at high level.. Do you want the actual suggested metrics names listed ...for each ..? This sort of thing join-time-avg, join-time-max , join-total (just example)