Closed pirgeo closed 6 months ago
The KafkaConsumerMetrics
has been deprecated and doesn't seem to be possible to produce "The total number of bytes consumed for a topic" as description, for example.
They seem to have been produced from the KafkaClientMetrics
and have come from org.apache.kafka.common.MetricName.description()
.
Looking into the KafkaClientMetrics
, the pairs won't exist at the same time as when a more specific meter is registered, a less specific one will be unregistered. So with this arrangement, it seems that Prometheus would be okay, but Dynatrace could be affected.
See https://kafka.apache.org/documentation/#consumer_fetch_monitoring
So you are saying that there should not be a situation where two meters with different descriptions are registered at the same time? I assume you are referring to checkAndBindMetrics. I've dug through that method, and I think I understand how it works:
Get metrics currently registered in Kafka, and remove the ones that are in Kafka but no longer in MM: https://github.com/micrometer-metrics/micrometer/blob/73f88795f427407e81edc17ff2fa0da8bba47dbd/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/kafka/KafkaMetrics.java#L187-L201
Split metrics stored in MM by metric name: https://github.com/micrometer-metrics/micrometer/blob/73f88795f427407e81edc17ff2fa0da8bba47dbd/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/kafka/KafkaMetrics.java#L203-L207
Then, iterate the metrics coming from Kafka and compare the number of tags on the metric in Kafka, and the metric in Micrometer. https://github.com/micrometer-metrics/micrometer/blob/73f88795f427407e81edc17ff2fa0da8bba47dbd/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/kafka/KafkaMetrics.java#L209-L241
break
).If I understand correctly, that means that it is possible to register two metrics with the same name and same number of tags, but only if the tags are not the same to the ones already there. I might be reading it wrong though. Not sure if that could be the reason for two Meters with the same name but different descriptions.
As for the Dynatrace registry: as soon as the meter is removed from the registry, it will no longer be exported. Checking for conflicting descriptions happens per-export, so I am pretty sure that at the time of exporting, there are two meters with the same name but different descriptions. I'll try to reproduce it, but I don't know when I'll get around to it. Since I am not really familiar with Kafka, I am not even sure where to start.
As for the Dynatrace registry: as soon as the meter is removed from the registry, it will no longer be exported. Checking for conflicting descriptions happens per-export, so I am pretty sure that at the time of exporting, there are two meters with the same name but different descriptions.
@pirgeo Sorry, I thought Dynatrace required the same description for the same meter during an application's lifetime.
I thought two different descriptions came from a meter without/with "topic" dimension, for example, and the latter would survive in the end. However, based on your comment, I might miss something as the two descriptions seem to exist at the same time for some reason.
@izeye Thank you for your response!
I thought Dynatrace required the same description for the same meter during an application's lifetime.
The discrepancy is currently checked per-export and per-metric-name. This is all on the client side, though, and the description will be checked against the one already stored for the metric name upon ingestion.
Either way, the Kafka instrumentation produces multiple descriptions for the same metric key. I guess I understand the rationale in this case, although it almost seems to me as if kafka.consumer.fetch.manager.bytes.consumed.total
and kafka.consumer.fetch.manager.bytes.consumed.total
with a topic
dimension should be two separate metrics. This is the first time I have ever come across a situation where the meaning of a metric changes during the runtime while the name stays the same. But I assume that ship has sailed, I am sure the folks on the Kafka side have their reasons for doing it that way.
As far as I can tell, there must be two separate Meters registered with the MeterRegistry at the same time, one with the topic dimension and one without. I am assuming that based on the fact that I get two different description texts in the same export ("The total number of bytes" vs. "The total number of bytes consumed for a topic"). In the Dynatrace exporter, we collect which meter names map to which descriptions (per export), and then warn if there is two conflicting descriptions.
I think the intention of the logic to remove the meter with less tags is to avoid the situation you seem to be running into. There may be a bug or a concurrency issue. Kafka metrics are unique in the way they work and we rely on whatever the Kafka client is providing in terms of metrics. It'd be good to get the name and tags of the overlapping meters when this happens as a first step. Then maybe we can figure out how to reproduce it and what is happening.
I think the intention of the logic to remove the meter with less tags is to avoid the situation you seem to be running into.
Yeah, that's what I assume, too. Unfortunately, I don't have access to the tags, and I couldn't reproduce it, so I am left guessing, but I'll try to find out!
Until we figure out which metrics are causing the issue, I'm not sure we'll be able to make a change we're sure fixes things. If we do get that info, please share here.
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open.
Thanks for considering this. I currently simply don't have the time to look into this more thoroughly, but I guess that there might be multiple registered metrics (probably for multiple registered Kafka consumers in different states) that are producing metrics with different descriptions. I'll reopen this once I have more info.
I guess that there might be multiple registered metrics (probably for multiple registered Kafka consumers in different states) that are producing metrics with different descriptions.
That sounds likely to be the issue. I'm not sure what we can do about it, though, since we are taking the descriptions provided by the Kafka client. To solve it on the Micrometer side, it seems like we would need to maintain some kind of mapping of known metrics that need to use a different description. That sounds quite fragile and we don't own the descriptions - the Kafka client does. They may be different in different versions of the client and that would be a nightmare to manage on our side. I'm wondering if the Kafka client would be willing to change the descriptions to avoid this issue.
Please describe the feature request. The Kafka instrumentation produces dynamic descriptions, meaning that descriptions are not always the same for one metric key. Here is a list of some descriptions that I have come across in the past:
kafka.consumer.fetch.manager.bytes.consumed.total
kafka.consumer.fetch.manager.bytes.consumed.total
kafka.consumer.fetch.manager.records.consumed.rate
kafka.consumer.fetch.manager.records.consumed.rate
kafka.consumer.fetch.manager.records.lag.max
kafka.consumer.fetch.manager.records.lag.max
kafka.consumer.fetch.manager.fetch.size.max
kafka.consumer.fetch.manager.fetch.size.max
kafka.consumer.fetch.manager.records.lead.min
kafka.consumer.fetch.manager.records.lead.min
Additional context There are prior examples using a static description per metric key, see #3201 and more recently #4864. I am not entirely sure how these metrics are created, but I think they are originating from this class: https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/kafka/KafkaConsumerMetrics.java
CC @izeye, since you worked on #4864, do you think this is a reasonable change?