redpanda-data / kminion

KMinion is a feature-rich Prometheus exporter for Apache Kafka written in Go. It is lightweight and highly configurable so that it will meet your requirements.
MIT License
622 stars 123 forks source link

Feature Request: Support Azure EventHub #267

Closed ElfoLiNk closed 3 weeks ago

ElfoLiNk commented 4 months ago

Would be good to support Azure EventHub since has the Kafka API.

Right now in the kminion logs I see the following warning:

"consumer group has committed offsets on a topic we don't have watermarks for"

kafka-consumer-groups cli it's able to display lag information from azure eventhub:

kafka-consumer-groups --bootstrap-server eventhub.servicebus.windows.net:9093 --command-config eventhub.properties --describe --group consumer-group-name

GROUP            TOPIC                         PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                                                                                            HOST            CLIENT-ID
consumer-group-name topic-name 0          6885            6885            0               eventhub.servicebus.windows.net:c:consumer-group-name:I:consumer-consumer-group-name-1-0213430d04c24eb2a187104afe083ec1 0.0.0.0         consumer-consumer-group-name-1
consumer-group-name topic-name 1          18204           18204           0               eventhub.servicebus.windows.net:c:consumer-group-name:I:consumer-consumer-group-name-1-0213430d04c24eb2a187104afe083ec1 0.0.0.0         consumer-consumer-group-name-1
consumer-group-name topic-name 2          7136            7146            10              eventhub.servicebus.windows.net:c:consumer-group-name:I:consumer-consumer-group-name-1-7fbe49f4f66a4973b676eedb151f74b7 0.0.0.0         consumer-consumer-group-name-1
consumer-group-name topic-name 3          6991            7078            87              eventhub.servicebus.windows.net:c:consumer-group-name:I:consumer-consumer-group-name-1-7fbe49f4f66a4973b676eedb151f74b7 0.0.0.0         consumer-consumer-group-name-1
consumer-group-name topic-name 4          7047            7047            0               eventhub.servicebus.windows.net:c:consumer-group-name:I:consumer-consumer-group-name-1-f871f476b2f242c8804f52a3dcc27fa1 0.0.0.0         consumer-consumer-group-name-1
consumer-group-name topic-name 5          7170            7170            0               eventhub.servicebus.windows.net:c:consumer-group-name:I:consumer-consumer-group-name-1-fb60f9ebfe0241e59692e55bdd33fb11 0.0.0.0         consumer-consumer-group-name-1
weeco commented 4 months ago

@ElfoLiNk You can change the scrape mode to Kafka API. If I'm not wrong this should only happen with the offsets topic scrape mode. KMinion basically collected the group offsets, but it's a different API call to retrieve the log-end-offset for each partition in these topics. Both the end offset and the group offset are obviously needed to calculate the lag. The warning itself shouldn't cause KMinion to not work in general, but it won't be able to report group lags for this specific topic.

Let me know if that helps

ElfoLiNk commented 4 months ago

Hi @weeco thank you for the quick response, i'm already using the adminApi

    minion:
      consumerGroups:
        enabled: true
        scrapeMode: adminApi
        granularity: partition
weeco commented 4 months ago

Ok that's pretty odd. Besides the warning, could you explain what is not working? Are you missing metrics in the output? Are they consistently missing or just occasionally?

ElfoLiNk commented 4 months ago

I think the issue is that EventHub lowercase topic names, I have in the metrics for example ENV_topicName and env_topicname.

kminion_kafka_consumer_group_topic_lag -> topic_name label = ENV_topicName

kminion_kafka_topic_partition_high_water_mark -> topic_name label = env_topicname

d-rk commented 1 month ago

I also tried to use KMinion with Eventhubs. In general everything works, but I think the main problem is how DescribeConsumerGroups behaves for the Kafka API of the Eventhub.

The DescribeConsumerGroups call only reports offsets as long as there is at least one consumer in the group. When the last consumer stops consuming, the call will stop reporting offsets some seconds afterwards.

So, when e.g. a consumer dies and thus no longer consumes, the lag will rise but we are unable to see it because the group is no longer reported by DescribeConsumerGroups.

I think the only way to fix this, would be to cache DescribeConsumerGroups responses and use the last valid one when an empty response is returned. But I don't know if this is something that you want to see in KMinion @weeco .

weeco commented 1 month ago

I think this is a bug / protocol violation in Eventhub. We cannot make changes for invalid implementations of the Kafka protocol given there are so many different implementations. I recommend raising this with the Eventhub team, this should have an impact on a lot of administrative tools.

d-rk commented 1 month ago

I also think it is a protocol violation in Eventhub. I could not find any documentation concerning this, but with this no kafka metrics exporter will work with eventhubs.