Open hackery opened 2 years ago
Hi @hackery, it sounds like exposing these kafka stats through inputs.internal would be a helpful tool to shed light on kafka behavior. Are you able to put together a PR to add this functionality?
I'm not sure these metrics are available through the kafka consumer library telegraf uses, https://github.com/Shopify/sarama. There is a recent feature request in that project to add more consumer metrics, including lag: https://github.com/Shopify/sarama/issues/2235 Are you familiar with sarama enough to confirm whether it can provide the metrics you're interested in?
I would love to work on this, although yes, it may need that Sarama work completing first - I shall have a look at whether I could take that on as well.
Do you know if there is any progress on this topic?
Feature Request
Proposal:
Add the lag of the consumer group specified in [[inputs.kafka_consumer]] into the telegraf [[inputs.internal]] metrics.
Current behavior:
The input can lag with no indication of this exposed.
Desired behavior:
When
[[inputs.internal]]
is enabled, the plugin adds selfstat items for the consumer group lag (other metrics might also be useful to add at this point). Sample output:Use case:
When a kafka consumer drops behind, it can be hard to diagnose. Kafka's own API does not expose consumer group offset metrics (they're stored in the offsets topics) and one might resort to the CLI tools, e.g.
While calls to the above could be wrapped in a script and called from Telegraf, the consumer input itself is in a better position to collect these metrics in context, apply tags etc.