Java client 1.x consumes more memory than 0.x

micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.

https://micrometer.io

Apache License 2.0

4.39k stars 965 forks source link

Java client 1.x consumes more memory than 0.x #5229

Open michaldo opened 5 days ago

michaldo commented 5 days ago

I observed that micrometer registry consumes more memory when I switch Spring Boot from 3.2.6 to 3.3.1

Environment

Micrometer version 1.13.1
Micrometer registry prometheus
Java version: Java 21

I attached profiler report collected by async profiler.

Green is memory consumption for micrometer-registry-prometheus: scrape() takes 446 MB Red is memory consumption for micrometer-registry-prometheus-simpleclient (I keep Spring Boot 3.3.1 but switch to simple client): 121 MB

jonatan-ivanov commented 4 days ago

Could you please verify if the two outputs are the same and you are not having four times more time series in the case of Prometheus 1.x?

How many time series (or lines) you have in your Prometheus output approximately?

Also, will the second and subsequent scrapes consume equal amount of memory or are they more lightweight?

What tool you used for the flame graph?

Will you get similar results if you look at a heap dump?

Also, Micrometer's overhead on the flame graphs seems pretty minimal to me, depending on the answer of the questions above, maybe this should be a Prometheus Client question at the end?

michaldo commented 4 days ago

Hi Jonatan, I check memory usage on test cloud cluster when only micrometer-registry-prometheus is changed. So answer for question 1 and 3 is that environment is stable and memory consumption is stable.

It may be important that my application heavy use Kafka and prometheous output contains 4703 Kafka rows out of 4850 total Flame graph is generated with Intellij

After profiler output inspection, suspected fragments of code are io.micrometer.core.instrument.config.NamingConvention#toSnakeCase and io.prometheus.metrics.model.snapshots.PrometheusNaming#replaceIllegalCharsInMetricName . They can be refactor with less memory requirements

jonatan-ivanov commented 4 days ago

After profiler output inspection, suspected fragments of code are io.micrometer.core.instrument.config.NamingConvention#toSnakeCase and io.prometheus.metrics.model.snapshots.PrometheusNaming#replaceIllegalCharsInMetricName. They can be refactor with less memory requirements

This conflicts with your flame graphs, I don't see any of these classes on it. NamingConvention has not been changed for about two years. PrometheusNamingConvention did change in 1.13 (Boot 3.3) but now it is doing less work and it is delegating to io.prometheus.metrics.model.snapshots.PrometheusNaming.

If you are really suspicious that io.prometheus.metrics.model.snapshots.PrometheusNaming is the root cause, you might want to create an issue for Prometheus Client.

michaldo commented 4 days ago

Indeed, suspected code I found has small impact on flame graph.

It is hard to determine root cause of memory consumption. I see that high level code is changed and memory consumption is visible on low level code - root cause is high level code recently changed or low level code not changed for years?

Anyway, I keep my opinion that pointed out methods are inefficient and micrometer-registry-prometheus.1.13.0 use more memory that micrometer-registry-prometheus-simpleclient.1.13.0

jonatan-ivanov commented 4 days ago

It is hard to determine root cause of memory consumption.

Analyzing a heap dump might help.

I see that high level code is changed and memory consumption is visible on low level code - root cause is high level code recently changed or low level code not changed for years?

I don't understand this question.

Anyway, I keep my opinion that pointed out methods are inefficient and micrometer-registry-prometheus.1.13.0 use more memory that micrometer-registry-prometheus-simpleclient.1.13.0

I never stated the opposite, I was trying to point out that you might opened the issue in the wrong repo, Micrometer is not the same as Prometheus Client (where io.prometheus.metrics.model.snapshots.PrometheusNaming is from).