micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.47k stars 990 forks source link

Support meters with the same name but different set of tag keys in PrometheusMeterRegistry #877

Open j0xaf opened 6 years ago

j0xaf commented 6 years ago

In connection with PrometheusMeterRegistry and the new KafkaConsumerMetrics (thanks to @jkschneider) I have a problem that metrics won´t register correctly because of PrometheusMeterRegistry complaining about metrics with same name but different set of key tags. It claims that metrics of same name are required in prometheus to have same set of tags.

I believe the error thrown at https://github.com/micrometer-metrics/micrometer/blob/master/implementations/micrometer-registry-prometheus/src/main/java/io/micrometer/prometheus/PrometheusMeterRegistry.java#L357 is wrong. Prometheus perfectly accepts metrics of same name with disjunct set of tags (the metrics name itself is just a tag __name__). We have in our prometheus production instance metrics like up{application="xyz", job="abc"} and up{application="abc", job="123", host="prometheus"} coexisting without any issue whatsoever.

Example of exception text, stacktrace below:

Prometheus requires that all meters with the same name have the same set of tag keys. There is already an existing meter containing tag keys [application, client_id]. The meter you are attempting to register has keys [application, client_id, topic].

Context of variables

id = MeterId{name='kafka.consumer.records.lag.max', tags=[ImmutableTag{key='application', value='heimdall'}, ImmutableTag{key='client.id', value='consumer-1'}, ImmutableTag{key='topic', value='probe_t'}]}
existingCollector = MeterId{name='kafka.consumer.records.lag.max', tags=[ImmutableTag{key='application', value='heimdall'}, ImmutableTag{key='client.id', value='consumer-1'}]}

Stacktrace:

collectorByName:348, PrometheusMeterRegistry (io.micrometer.prometheus)
newGauge:225, PrometheusMeterRegistry (io.micrometer.prometheus)
lambda$gauge$1:244, MeterRegistry (io.micrometer.core.instrument)
apply:-1, 4552198 (io.micrometer.core.instrument.MeterRegistry$$Lambda$373)
lambda$registerMeterIfNecessary$5:514, MeterRegistry (io.micrometer.core.instrument)
apply:-1, 491928400 (io.micrometer.core.instrument.MeterRegistry$$Lambda$338)
getOrCreateMeter:563, MeterRegistry (io.micrometer.core.instrument)
registerMeterIfNecessary:525, MeterRegistry (io.micrometer.core.instrument)
registerMeterIfNecessary:514, MeterRegistry (io.micrometer.core.instrument)
gauge:244, MeterRegistry (io.micrometer.core.instrument)
register:128, Gauge$Builder (io.micrometer.core.instrument)
registerGaugeForObject:149, KafkaConsumerMetrics (io.micrometer.core.instrument.binder.kafka)
registerGaugeForObject:153, KafkaConsumerMetrics (io.micrometer.core.instrument.binder.kafka)
lambda$bindTo$0:79, KafkaConsumerMetrics (io.micrometer.core.instrument.binder.kafka)
accept:-1, 1260340889 (io.micrometer.core.instrument.binder.kafka.KafkaConsumerMetrics$$Lambda$328)
lambda$registerMetricsEventually$11:205, KafkaConsumerMetrics (io.micrometer.core.instrument.binder.kafka)
handleNotification:-1, 820539250 (io.micrometer.core.instrument.binder.kafka.KafkaConsumerMetrics$$Lambda$329)
handleNotification:1754, DefaultMBeanServerInterceptor$ListenerWrapper (com.sun.jmx.interceptor)
handleNotification:275, NotificationBroadcasterSupport (javax.management)
run:352, NotificationBroadcasterSupport$SendNotifJob (javax.management)
execute:337, NotificationBroadcasterSupport$1 (javax.management)
sendNotification:248, NotificationBroadcasterSupport (javax.management)
sendNotification:209, MBeanServerDelegate (javax.management)
sendNotification:1498, DefaultMBeanServerInterceptor (com.sun.jmx.interceptor)
registerWithRepository:1911, DefaultMBeanServerInterceptor (com.sun.jmx.interceptor)
registerDynamicMBean:966, DefaultMBeanServerInterceptor (com.sun.jmx.interceptor)
registerObject:900, DefaultMBeanServerInterceptor (com.sun.jmx.interceptor)
registerMBean:324, DefaultMBeanServerInterceptor (com.sun.jmx.interceptor)
registerMBean:522, JmxMBeanServer (com.sun.jmx.mbeanserver)
reregister:167, JmxReporter (org.apache.kafka.common.metrics)
metricChange:85, JmxReporter (org.apache.kafka.common.metrics)
registerMetric:545, Metrics (org.apache.kafka.common.metrics)
add:256, Sensor (org.apache.kafka.common.metrics)
add:241, Sensor (org.apache.kafka.common.metrics)
recordTopicFetchMetrics:1291, Fetcher$FetchManagerMetrics (org.apache.kafka.clients.consumer.internals)
access$3200:1246, Fetcher$FetchManagerMetrics (org.apache.kafka.clients.consumer.internals)
record:1230, Fetcher$FetchResponseMetricAggregator (org.apache.kafka.clients.consumer.internals)
drain:982, Fetcher$PartitionRecords (org.apache.kafka.clients.consumer.internals)
nextFetchedRecord:1033, Fetcher$PartitionRecords (org.apache.kafka.clients.consumer.internals)
fetchRecords:1095, Fetcher$PartitionRecords (org.apache.kafka.clients.consumer.internals)
access$1200:949, Fetcher$PartitionRecords (org.apache.kafka.clients.consumer.internals)
fetchRecords:570, Fetcher (org.apache.kafka.clients.consumer.internals)
fetchedRecords:531, Fetcher (org.apache.kafka.clients.consumer.internals)
pollOnce:1178, KafkaConsumer (org.apache.kafka.clients.consumer)
poll:1111, KafkaConsumer (org.apache.kafka.clients.consumer)
run:699, KafkaMessageListenerContainer$ListenerConsumer (org.springframework.kafka.listener)
call:511, Executors$RunnableAdapter (java.util.concurrent)
run:266, FutureTask (java.util.concurrent)
run:748, Thread (java.lang)
jkschneider commented 6 years ago

This actually is enforced by the underlying Prometheus client that Micrometer is delegating to. It's possible for distinct processes to send metrics with different sets of tag keys, but Prometheus does not allow it from a single process.

All Micrometer is doing is giving you a more readable error message about it before the Prometheus client blows up.

flozano commented 5 years ago

So basically this is expected? Kafka metrics as configured by KafkaConsumerMetrics is incompatible with Prometheus-based registry? I'm getting

java.lang.IllegalArgumentException: Prometheus requires that all meters with the same name have the same set of tag keys. There is already an existing meter named 'kafka_consumer_records_lag_records' containing tag keys [client_id, topic]. The meter you are attempting to register has keys [client_id].
izeye commented 5 years ago

@flozano Try the latest version if you haven’t done yet as there were updates on the Kafka binder recently.

jdbranham commented 4 years ago

If the meter tags are different, could we unregister then register the meter with a combined list of tags?

Also, am I looking at the right prometheus client implementation? https://github.com/prometheus/client_java/blob/master/simpleclient/src/main/java/io/prometheus/client/CollectorRegistry.java#L49

cschuyle commented 4 years ago

Hi @jdbranham, have you been able to fix or work around this? I have a similar problem that PrometheusMeterRegistry is in conflict with something else, but instead of KafkaConsumerMetrics, it's because I'm using Spring Boot actuator.

If I annotate the controller method with just @Timed() with no parameters, there is no error. However, if I use @Timed with a name argument, like this:

    @Timed("some-operation")
    public void someOperation() {}

... then it errors when someOperation() is called. Hypothesis: both Actuator and Prometheus are trying to use the annotation to create a metric (each with their own set of tags), but Prometheus requires the name be unique.

Stack trace:

java.lang.IllegalArgumentException: Prometheus requires that all meters with the same name have the same set of tag keys. There is already an existing meter named 'some_operation_seconds' containing tag keys [class, exception, method]. The meter you are attempting to register has keys [exception, method, outcome, status, uri].
    at io.micrometer.prometheus.PrometheusMeterRegistry.lambda$collectorByName$9(PrometheusMeterRegistry.java:382) ~[micrometer-registry-prometheus-1.3.5.jar:1.3.5]
...
org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.getTimer(WebMvcMetricsFilter.java:185) ~[spring-boot-actuator-2.2.5.RELEASE.jar:2.2.5.RELEASE]

I've also tried to upgrade to a newer version of micrometer (1.4.1), but the result is the same.

KeithWoods commented 4 years ago

@flozano @izeye I think this issue still exists.

I think this is because some metrics exists at multiple levels with different tags, for example, kafka_consumer_fetch_manager_fetch_size_max. It exists at the client level but also at the topic/partition level. At the client level there is no associated topic tag, at the topic/partition there is. One of these 2 will get registered first, then the second will cause an exception as the tags list is different.

Here is above metric in jconsole:

kafkaConsumerMetricIssue

Sometimes this causes an uncaught exception originating here (note 1.5 branch) , other times it seems silent. Sometimes the metrics in question show up on my prometheus endpoint, sometimes they don't. This is a bit vague as I'm not fully understanding why, I'm guessing the one that wins the above race shows up and the other doesn't, and the order may differ depending on the order metrics get registered.

When I do see an exception, this is it:

Caused by: java.lang.IllegalArgumentException: Prometheus requires that all meters with the same name have the same set of tag keys. There is already an existing meter named 'kafka_consumer_fetch_manager_fetch_size_max' containing tag keys [client_id, kafka_version, spring_id, topic]. The meter you are attempting to register has keys [client_id, kafka_version, spring_id].
    at io.micrometer.prometheus.PrometheusMeterRegistry.lambda$applyToCollector$17(PrometheusMeterRegistry.java:429)
    at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1877)
    at io.micrometer.prometheus.PrometheusMeterRegistry.applyToCollector(PrometheusMeterRegistry.java:413)
    at io.micrometer.prometheus.PrometheusMeterRegistry.newGauge(PrometheusMeterRegistry.java:207)
    at io.micrometer.core.instrument.MeterRegistry.lambda$gauge$1(MeterRegistry.java:295)
    at io.micrometer.core.instrument.MeterRegistry.lambda$registerMeterIfNecessary$5(MeterRegistry.java:559)
    at io.micrometer.core.instrument.MeterRegistry.getOrCreateMeter(MeterRegistry.java:612)
    at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:566)
    at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:559)
    at io.micrometer.core.instrument.MeterRegistry.gauge(MeterRegistry.java:295)
    at io.micrometer.core.instrument.Gauge$Builder.register(Gauge.java:190)
    at io.micrometer.core.instrument.composite.CompositeGauge.registerNewMeter(CompositeGauge.java:58)
    at io.micrometer.core.instrument.composite.CompositeGauge.registerNewMeter(CompositeGauge.java:27)
    at io.micrometer.core.instrument.composite.AbstractCompositeMeter.add(AbstractCompositeMeter.java:66)
    at java.lang.Iterable.forEach(Iterable.java:75)
    at java.util.Collections$SetFromMap.forEach(Collections.java:5476)
    at io.micrometer.core.instrument.composite.CompositeMeterRegistry.lambda$new$0(CompositeMeterRegistry.java:65)
    at io.micrometer.core.instrument.composite.CompositeMeterRegistry.lock(CompositeMeterRegistry.java:184)
    at io.micrometer.core.instrument.composite.CompositeMeterRegistry.lambda$new$1(CompositeMeterRegistry.java:65)
    at io.micrometer.core.instrument.MeterRegistry.getOrCreateMeter(MeterRegistry.java:622)
    at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:566)
    at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:559)
    at io.micrometer.core.instrument.MeterRegistry.gauge(MeterRegistry.java:295)
    at io.micrometer.core.instrument.Gauge$Builder.register(Gauge.java:190)
    at io.micrometer.core.instrument.binder.kafka.KafkaMetrics.registerGauge(KafkaMetrics.java:182)
    at io.micrometer.core.instrument.binder.kafka.KafkaMetrics.bindMeter(KafkaMetrics.java:175)
    at io.micrometer.core.instrument.binder.kafka.KafkaMetrics.lambda$checkAndBindMetrics$1(KafkaMetrics.java:161)
    at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
    at java.util.Collections$UnmodifiableMap.forEach(Collections.java:1505)
    at io.micrometer.core.instrument.binder.kafka.KafkaMetrics.checkAndBindMetrics(KafkaMetrics.java:137)
    at io.micrometer.core.instrument.binder.kafka.KafkaMetrics.bindTo(KafkaMetrics.java:93)
    at io.micrometer.core.instrument.binder.kafka.KafkaClientMetrics.bindTo(KafkaClientMetrics.java:39)
    at org.springframework.kafka.core.MicrometerConsumerListener.consumerAdded(MicrometerConsumerListener.java:74)
    at org.springframework.kafka.core.DefaultKafkaConsumerFactory.createKafkaConsumer(DefaultKafkaConsumerFactory.java:301)
    at org.springframework.kafka.core.DefaultKafkaConsumerFactory.createConsumerWithAdjustedProperties(DefaultKafkaConsumerFactory.java:271)
    at org.springframework.kafka.core.DefaultKafkaConsumerFactory.createKafkaConsumer(DefaultKafkaConsumerFactory.java:245)
    at org.springframework.kafka.core.DefaultKafkaConsumerFactory.createConsumer(DefaultKafkaConsumerFactory.java:219)
    at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.<init>(KafkaMessageListenerContainer.java:592)
    at org.springframework.kafka.listener.KafkaMessageListenerContainer.doStart(KafkaMessageListenerContainer.java:294)
    at org.springframework.kafka.listener.AbstractMessageListenerContainer.start(AbstractMessageListenerContainer.java:338)

I'm not sure what to do for now. As @jkschneider said it's not really a micrometer problem, it's more the metrics are not really comparable with prometheus, or perhaps when auto registered as they are now they are not compatible. If anyone has ideas let me know.

I'm using this via spring kafkas KafkaMessageListenerContainer which internally uses KafkaClientMetrics. I noticed there is a deprecated KafkaConsumerMetrics in the micrometer codebase which takes a more manual approach to registering metrics. I'll prob do something similar to register just the metrics I care about and see how that goes.

KeithWoods commented 4 years ago

A bit more information: After I looked to tweak KafkaMetrics this to solve my issue, I noticed it has some logic to remove and add existing metres if the tags are different (as was suggested in this thread), however it only does this if the clients are the same. In my above case the clients are different as i have multiple consumers with differing client ID, given that the remove/add logic wasn't run. The logic in question: https://github.com/micrometer-metrics/micrometer/blob/1.5.x/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/kafka/KafkaMetrics.java#L152-L158

If I remove if (differentClient(tags)) break; everything appears is fine. Would like to understand the impact of this.

fieldju commented 4 years ago

Okay, I am running into this issue as well.

I copied and pasted the registry and modified the offending code so that it creates a new collector instead.

I have written a functional test that proves that https://github.com/micrometer-metrics/micrometer/issues/877#issuecomment-425102772 is no longer true.

https://github.com/armory-plugins/armory-observability-plugin/blob/2819e3f6b4e2c6a8ad88fbd01511a52281c4f957/common/src/test/java/io/armory/plugin/observability/prometheus/PrometheusScrapeControllerFunctionalTest.java#L87-L117

I have also written an integration test that proves that Prometheus can handle scraping metrics that have the same name but varying labels.

https://github.com/armory-plugins/armory-observability-plugin/blob/2819e3f6b4e2c6a8ad88fbd01511a52281c4f957/common/src/test/java/io/armory/plugin/observability/prometheus/PrometheusScrapeControllerIntegrationTest.java#L70-L169

I feel like we could easily replace

https://github.com/micrometer-metrics/micrometer/blob/c9ae058383cbdb5f2ed838fabb180901ab1cf4db/implementations/micrometer-registry-prometheus/src/main/java/io/micrometer/prometheus/PrometheusMeterRegistry.java#L404-L424

with

  private MicrometerCollector collectorByName(Meter.Id id) {
    return collectorMap.compute(
        getConventionName(id),
        (name, existingCollector) -> {
          List<String> tagKeys = getConventionTags(id).stream().map(Tag::getKey).collect(toList());
          if (existingCollector != null && existingCollector.getTagKeys().equals(tagKeys)) {
            return existingCollector;
          }
          return new MicrometerCollector(id, config().namingConvention(), prometheusConfig)
              .register(registry);
        });
  }

And eliminate the source of this developer friction when using this registry.

schlangz commented 4 years ago

@fieldju thank you

dnowak-wrldrmt commented 4 years ago

Hi, I have encountered the issue as I add some tags (like error name) basing on the result of an operation.

This actually is enforced by the underlying Prometheus client that Micrometer is delegating to. It's possible for distinct processes to send metrics with different sets of tag keys, but Prometheus does not allow it from a single process.

All Micrometer is doing is giving you a more readable error message about it before the Prometheus client blows up.

I have checked Prometheus documentation and there is no information about such requirement - looks like the metric name can be associated with varying number of tags/labels:

Labels enable Prometheus's dimensional data model: any given combination of labels for the same metric name identifies a particular dimensional instantiation of that metric (for example: all HTTP requests that used the method POST to the /api/tracks handler). The query language allows filtering and aggregation based on these dimensions. Changing any label value, including adding or removing a label, will create a new time series.

Source: https://prometheus.io/docs/concepts/data_model/

Looks like the Micrometer's integration is too strict.

checketts commented 4 years ago

I have checked Prometheus documentation and there is no information about such requirement - looks like the metric name can be associated with varying number of tags/labels:

Thanks for checking the documentation!

Unfortunately, the underlying Prometheus client isn't implemented to that specification. See https://github.com/prometheus/client_java/blob/master/simpleclient/src/main/java/io/prometheus/client/SimpleCollector.java#L63

So you could open a ticket with that repo to try to get it improved, or if you can think of a good way to work around that limitation that would also be welcome.

vladak commented 4 years ago

I got hit by this too and was baffled because I saw that JvmMemoryMetrics in Micrometer produces tagged metrics like:

jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.8733824E7
jvm_memory_used_bytes{area="heap",id="G1 Survivor Space",} 2.9360128E7
jvm_memory_used_bytes{area="heap",id="G1 Old Gen",} 1.1652344E7
jvm_memory_used_bytes{area="nonheap",id="Metaspace",} 5.4593912E7
jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 1369344.0
jvm_memory_used_bytes{area="heap",id="G1 Eden Space",} 2.78921216E8
jvm_memory_used_bytes{area="nonheap",id="Compressed Class Space",} 6206336.0
jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 5494784.0

The trick seems to be in using builder: https://github.com/micrometer-metrics/micrometer/blob/0dc111d1f6d3aa288dda24ac0912cbb483f1da1e/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/jvm/JvmMemoryMetrics.java#L81-L89

I tried that with Timer and Counter and it works fine.

dnowak-wrldrmt commented 4 years ago

@vladak This is not a solution for the problem as the problem is that you change the set of labels assigned to a meter. It happens to me in two cases:

vladak commented 4 years ago

I see, it just shares the same symptom.

Dne pá 18. 9. 2020 7:46 uživatel dnowak-wrldrmt notifications@github.com napsal:

@vladak https://github.com/vladak This is not a solution for the problem as the problem is that you change the set of labels assigned to a meter. It happens to me in two cases:

  • common tags are added to meter registry on application initialization (Spring Boot) registry.config().commonTags(tags)
  • dynamic tags calculation -> I only add error tag with the type of an error when an operation fails - every time I register a new meter with a set of tags that are based on the execution result.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/micrometer-metrics/micrometer/issues/877#issuecomment-694666831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWMMDBYQ7RA43BTTAQPQ5DSGLX3NANCNFSM4FXLRHKA .

borowis commented 4 years ago

I feel like there is a bug in Micrometer related to removing meters with fewer tags:

                for (Meter other : registry.find(meterName).meters()) {
                    List<Tag> tags = other.getId().getTags();
                    // Only consider meters from the same client before filtering
                    if (differentClient(tags)) break;
                    if (tags.size() < meterTagsWithCommonTags.size()) registry.remove(other);
                    // Check if already exists
                    else if (tags.size() == meterTagsWithCommonTags.size())
                        if (tags.equals(meterTagsWithCommonTags)) return;
                        else break;
                    else hasLessTags = true;
                }

Above is the logic to remove the meter with less tags but it's conditional on:

if (differentClient(tags)) break;

I'm using ConcurrentKafkaListenerContainerFactory from Spring Kafka that has 5 clientIds, therefore if first clientId happens not to match clientId being registered the removal rule doesn't work and I get instead:

java.lang.IllegalArgumentException: Prometheus requires that all meters with the same name have the same set of tag keys. 
There is already an existing meter named 'kafka_consumer_fetch_manager_fetch_size_max'  containing tag keys [client_id, kafka_version, spring_id]. The meter you are attempting to register has keys [client_id, kafka_version, spring_id, topic].

:grey_question: My question is, should break be replaced with continue, like this:

if (differentClient(tags)) continue;

(please also see what @KeithWoods says above).

I don't fully understand the logic but feel like that would allow correctly replace the meters with fewer tags. Curious what do you think, @jkschneider?

borowis commented 4 years ago

My point above has been addressed in https://github.com/micrometer-metrics/micrometer/pull/2312, thanks @shakuzen!

shakuzen commented 3 years ago

Hi, I have encountered the issue as I add some tags (like error name) basing on the result of an operation.

@dnowak-wrldrmt How we dealt with this in, for example, Spring Web MVC metrics is always having an exception tag; the value is none if there was no exception.

I'm reopening this for some more consideration as this keeps coming up and we've made recent changes to not throw an exception anymore by default in these cases. The current state is still that the scrape will not contain metrics with different labels, though. I'm reopening to see if we can make that happen somehow.

After some investigation into this, I will echo what @checketts said in https://github.com/micrometer-metrics/micrometer/issues/877#issuecomment-655598760 that the Prometheus Java client does not seem to support this by design. That being said, it is indeed true that the Prometheus server can and will scrape metrics with different labels, with no problem as far as I can tell.

@fieldju thank you for the tests. That was useful to look at. The modifications, however, are causing meters to end up as different metrics with the same name, which violates the Prometheus scrape format, even though the Prometheus server seems to have no problem scraping this.

Specifically, your modifications result in a scrape like:

# HELP foo_total  
# TYPE foo_total counter
foo_total{hostname="localhost",optionalExtraMetadata="my-cool-value",} 1.0
# HELP foo_total  
# TYPE foo_total counter
foo_total{hostname="localhost",} 1.0

The above violates the following snippets from the Prometheus scrape format:

Only one HELP line may exist for any given metric name.

All lines for a given metric must be provided as one single group, with the optional HELP and TYPE lines first (in no particular order).

So my understanding is that the scrape should be like:

# HELP foo_total  
# TYPE foo_total counter
foo_total{hostname="localhost",optionalExtraMetadata="my-cool-value",} 1.0
foo_total{hostname="localhost",} 1.0

Trying to register things with the Prometheus client so this happens results in the aforementioned exception. We need to find a way to properly register them, which may not be currently exist. In that case, we would need to see if the Prometheus Java client is willing to make changes to support this, as suggested by @checketts. For example, here is a pure Prometheus Java client unit test that fails trying to do what this issue asks to support:

@Test
void promClientSameMetricNameDifferentLabels() throws IOException {
    CollectorRegistry registry = new CollectorRegistry();
    Counter counter = Counter.build().name("my_counter").labelNames("hi").help("unhelpful string")
            .register(registry);
    Counter counter1 = Counter.build().name("my_counter").labelNames("hello").help("unhelpful string")
            .register(registry); // fails with "Collector already registered that provides name: my_counter"

    counter.labels("world").inc(2);
    counter1.labels("world").inc(3);

    StringWriter writer = new StringWriter();
    TextFormat.write004(writer, registry.metricFamilySamples());
    System.out.println(writer.toString());
}
checketts commented 3 years ago

I finally ran into this same problem in my own application code. Spring was configuring caches using the cacheManager so half of my caches were being registered with a cacheManager tag, however I had several other caches that were being registered via a different mechanism. I worked around the Prometheus limitation by adding a filter that added the extra empty tag like so:

registry.config().meterFilter(new MeterFilter() {
                @NotNull
                @Override
                public Meter.Id map(@NotNull Meter.Id id) {
                    if (id.getName().startsWith("cache") && id.getTag("cacheManager") == null) {
                        return id.withTags(Tags.of("cacheManager", ""));
                    }
                    return id;
                }
            });
jeeftor commented 3 years ago

I have 7 apps I've instrumented and I finally ran into this error on only one of them... specifically wiht

jvm_gc_pause_seconds

jeeftor commented 3 years ago

So I came up with a workaround.

I was registering an application tag on all my meters (I thought) but apparently the GC got instantiated prior to me actually doing this... so the fix I found was to use a MeterRegistryCustomizer bean:

    @Value("${spring.application.name}")
    String appName;

    @Bean
    MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
        return registry -> registry.config().commonTags("application", appName);
    }
    @Value("\${spring.application.name}")
    val appName: String? = null

    @Bean
    fun metricsCommonTags(): MeterRegistryCustomizer<MeterRegistry> {
        return MeterRegistryCustomizer { registry: MeterRegistry ->
            registry.config().commonTags("application", appName)
        }
    }

This got my errors to go away

angry-cellophane commented 3 years ago

@shakuzen I have an idea how to address this issue and curious what you think about it.

So it is not possible to register several metrics with the same name in prometheus' io.prometheus.client.CollectorRegistry indeed as you mentioned before. it checks if the name is already registered

  public void register(Collector m) {
    List<String> names = collectorNames(m);
    synchronized (namesCollectorsLock) {
      for (String name : names) {
        if (namesToCollectors.containsKey(name)) {
          throw new IllegalArgumentException("Collector already registered that provides name: " + name);
        }
      }
      for (String name : names) {
        namesToCollectors.put(name, m);
      }
      collectorsToNames.put(m, names);
    }
  }

So it is not possible to register multiple collectors with the same metric name but why not to change the collector then? The main goal of a collector is to return a list of samples

    @Override
    public List<MetricFamilySamples> collect() {
       /// ...
    }

each sample has label keys and values, so it is possible to modify the collector to support dynamic label keys and values. Currently it takes the keys in the constructor

My idea is not to store label keys in the collector and instead add them dynamically with label values like it does now.

So I've built simple PoC to verify if it makes sense, I didn't want to fork the micrometer repo and created mine. I copied MeterCollector and PrometheusMeterRegistry, made some small changes in them and wrote simple tests to verify if it works.

The repo https://github.com/angry-cellophane/micormeter-prometheus-fix My new collector that stores label keys and values dynamically https://github.com/angry-cellophane/micormeter-prometheus-fix/blob/main/src/main/java/io/micrometer/prometheus/MultiTagMicrometerCollector.java#L66

Prometheus output looks like this

# HELP counter_total  
# TYPE counter_total counter
counter_total{type="success",version="1",} 1.0
counter_total{code="500",type="error",version="1",} 1.0

let me know what you think about it, if there are any issues I've missed, and if this idea makes sense to you.

MSDehghan commented 3 years ago

@angry-cellophane Thanks. Good idea. This limitation is really annoying. Hoping to see it's eliminated.

checketts commented 3 years ago

@angry-cellophane I agree this is a frustration, and I would love for it to be something that 'just works for users', however I have a continuing concern around the PromQL implications. I'm by no means a PromQL master, but my understanding is that, unlike SQL, Prometheus automatically joins like time series, as long as all tags match.

That means that opposite is also true: Prometheus will drop metrics that don't all match. So I'm worried about how Prometheus will treat the mis-matched tags when doing graphs and aggregations.

Since that is a fundamental way that Prometheus works, they have rejected suggestions on how to work around it in the underlying data model and instead have insisted that clients publish matching tags.

I don't have a specific recommendation currently, just wanted to raise the data model concerns in case others have investigated that aspect further and can offer insight.

angry-cellophane commented 3 years ago

@checketts

Prometheus will drop metrics that don't all match.

Prometheus (the database + scraper) doesn't drop metrics, it works fine with different sets of label names and values by creating a new time series for each unique combination of key-value labels. The only downside is disk space. The metrics schema is not defined upfront, labels can be added or dropped if required not impacting existing data (a new time series).

Prometheus java sdk lets clients aggregate metrics in memory and return current values when Prometheus scrapes data.

Prometheus java sdk provides:

  1. Collectors - an abstraction that returns metric samples when called. It's like a java.util.function.Supplier that returns a list of metric samples. Each sample has a list of label names and values associated with it.
  2. CollectorRegistry - collector-aggregator. It doesn't store any metrics, instead it stores other collectors and returns their results when called. It's like an iterator of iterators.
  3. Counters, gauages, histograms, summaries - pre-baked collectors, keep the current state in memory and return metrics samples when called.

2 has no restriction on label names and values

3 has a restriction, label names are immutable and defined when the object (e.g. a Counter) is being built. Label names cannot change after the object is built. This only applies to prometheus objects (io.prometheus.client.Counter, io.prometheus.client.Gauge, etc).

So micrometer doesn't really use #3, only #1 and #2. Every micrometer collector (e.g. io.micrometer.core.instrument.Counter) is registered as a collector (io.micrometer.prometheus.MicrometerCollector) in CollectorRegistry. When CollectorRegistry is called out, it goes through all registered collectors, calls them out and returns the result. It doesn't check label names and values, it simply returns what collectors return. There's no any restriction on the prometheus side besides #3 I mentioned above but micrometer doesn't use anyway. Micrometer doesn't support different label names due to the way io.micrometer.prometheus.MicrometerCollector has been implemented. It makes label names immutable and label values dynamic. I raised a PR (linked to this issue) to make label names dynamic too.

If this issue is critical for anyone, you can copy the changed classes from my PR in put them in your project. I did the same about a month ago and get all metrics as expected since then.

checketts commented 3 years ago

@angry-cellophane Sorry, using the word 'drop' was confusing, you are correct that the SDK, Scraping, and Prometheus storage won't drop the unmatched metrics.

I was referring to how PromQL queries work. I have a feeling (though my understand is incomplete) that PromQL queries will exclude some metrics from calculations when labels don't match, requiring a user to use the without or by to get a complete result.

If you have done any work in regards to ensuring that PromQL queries make sense when the underlying meter labels aren't all aligned, I would love to hear about your methods and approach.

angry-cellophane commented 3 years ago

@checketts it depends on the query and how it groups data but generally no issue. Prometheus creates a separate time series for each unique combination of key-labels. The user, who writes prom queries, need to keep in mind that there's more than 1 set of label names and that's it.

Some simple examples:

metrics

# HELP responses
# TYPE app_responses counter
app_responses{uri="/", outcome="SUCCESS", clientId="123"} 10
app_responses{uri="/", outcome="SUCCESS", clientId="456"} 5
app_responses{uri="/", outcome="FAILURE", error="INVALID_REQUEST", clientId="123"} 2
app_responses{uri="/", outcome="FAILURE", error="DATA_NOT_FOUND", clientId="456"} 3
  1. Percent of failed requests in past 10 mins sum(increase(app_responses{outcome="FAILURE"}[10m])) / sum(increase(app_responses[10m])) returns 0.25 as expected

  2. For each error to show its percent of all requests in past 10 mins. sum(increase(app_responses{outcome="FAILURE"}[10m])) by (error) / ignoring(error) group_left sum(increase(app_responses[10m])) returns 0.1 and 0.15 as expected.

angry-cellophane commented 3 years ago

seems like my PR with the fix got stuck. For anyone, who's eager to try the fix, I created an example of how it can be used, you'll need to copy two classes into your project https://github.com/angry-cellophane/micormeter-prometheus-fix

shakuzen commented 3 years ago

I've opened https://github.com/prometheus/client_java/issues/696 in the repository for the official Java client for Prometheus to get some input from the Prometheus experts on this.

stolsvik commented 3 years ago

Just want to chime in that the result of effectively just "dropping" later attempts at making any meter with same name but different tags (when using PrometheusMeterRegistry), without any warning or anything in the logs, has been very frustrating, costing me several hours of debugging and then googling before finding these issues.

It says abundantly clearly in all documentation that meters are unique as long as the combination of name and tags are different. It quickly became obvious that changing the name of the second set of meters with different tags "fixed" the problem, but this is not what the docs state.

stolsvik commented 3 years ago

Note that the observed behavior is that when adding new meters with the same name but differing in the keys (number or name) of the Tags, the later registrations are ignored. When only the value of the Tags are changed, it works correctly. Therefore, a "fix" is to make both sets of meters (having the same name) also have the same set of Tags, where the irrelevant tags of each subset is set to a fixed dummy value (or maybe even better, the empty string)

flozano commented 1 year ago

this issue keeps coming back, not only in kafka but for example in micrometer apache client interceptor(async instrumentation) + executor (sync instrumentation)- if you use both, you get slightly different set of tags (sync version includes "outcome", async version does not), and your async metrics are gone.

rdehuyss commented 1 year ago

Thanks @stolsvik! With your input I at least have a workaround.

wyp900917 commented 6 months ago

Hi @jdbranham, have you been able to fix or work around this? I have a similar problem that PrometheusMeterRegistry is in conflict with something else, but instead of KafkaConsumerMetrics, it's because I'm using Spring Boot actuator.

If I annotate the controller method with just @timed() with no parameters, there is no error. However, if I use @timed with a name argument, like this:

    @Timed("some-operation")
    public void someOperation() {}

... then it errors when someOperation() is called. Hypothesis: both Actuator and Prometheus are trying to use the annotation to create a metric (each with their own set of tags), but Prometheus requires the name be unique.

Stack trace:

java.lang.IllegalArgumentException: Prometheus requires that all meters with the same name have the same set of tag keys. There is already an existing meter named 'some_operation_seconds' containing tag keys [class, exception, method]. The meter you are attempting to register has keys [exception, method, outcome, status, uri].
  at io.micrometer.prometheus.PrometheusMeterRegistry.lambda$collectorByName$9(PrometheusMeterRegistry.java:382) ~[micrometer-registry-prometheus-1.3.5.jar:1.3.5]
...
org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.getTimer(WebMvcMetricsFilter.java:185) ~[spring-boot-actuator-2.2.5.RELEASE.jar:2.2.5.RELEASE]

I've also tried to upgrade to a newer version of micrometer (1.4.1), but the result is the same.

have you resolve this problem?

iRitiLopes commented 3 months ago

Thanks @stolsvik! With your input I at least have a workaround.

How? Forcing to keep all tags as the same?