micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.46k stars 986 forks source link

Descriptions of system.cpu.usage and process.cpu.usage give issue when scraping with NewRelic Prometheus scaper #3242

Closed MGathier closed 2 years ago

MGathier commented 2 years ago

Describe the bug The descriptions for the system.cpu.usage and process.cpu.usage contain a double quote which is invalid according the the NewRelic Prometheus scaper. Processing gives the following error: time="2022-06-22T12:49:15Z" level=warning msg="fetching Prometheus metrics: http://10.124.8.5:8024/actuator/prometheus (axonserver-3-0)" component=Fetcher error="text format parsing error in line 1336: invalid escape sequence '\\"'"

Environment

To Reproduce How to reproduce the bug: Create a spring boot application, with actuator/prometheus dependency and check the output of the /actuator/prometheus endpoint. This contains double quotes in the HELP lines for the CPU usage metrics.

Expected behavior HELP line without double quotes

Additional context

jonatan-ivanov commented 2 years ago

There are a couple of things worth to be noted here:

# HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process
# TYPE process_cpu_usage gauge
process_cpu_usage{application="resourceater",} 0.017699115044247787
# HELP system_cpu_usage The "recent cpu usage" of the system the application is running in
# TYPE system_cpu_usage gauge
system_cpu_usage{application="resourceater",} 0.21929824561403508
# TYPE process_cpu_usage gauge
# HELP process_cpu_usage The \"recent cpu usage\" for the Java Virtual Machine process
process_cpu_usage{application="resourceater"} 9.763718023823473E-4
# TYPE system_cpu_usage gauge
# HELP system_cpu_usage The \"recent cpu usage\" of the system the application is running in
system_cpu_usage{application="resourceater"} 0.17314073784891665

Both of these examples are produced by a Spring Boot app using Micrometer (using the Prometheus Java Client).

Based on this, I think the issue is not with Micrometer itself. Also, as far as know, the examples above are valid. I guess this should be a fair assumption given that they are produced by the Prometheus Java Client which follows the formatting standards.

Because of these I think there is an issue around the NewRelic Prometheus scraper. Here's what I would do:

Because it is very likely that the issue is not in Micrometer, I'm closing this, please let us know if you disagree and want us to reopen.

shakuzen commented 2 years ago

For additional reference, here is the section on escaping in the text format of the OpenMetrics specification:

Where the ABNF notes escaping, the following escaping MUST be applied Line feed, '\n' (0x0A) -> literally '\n' (Bytecode 0x5c 0x6e) Double quotes -> '\"' (Bytecode 0x5c 0x22) Backslash -> '\' (Bytecode 0x5c 0x5c)

fieldju commented 2 years ago

FWIW I opened https://github.com/newrelic/nri-prometheus/issues/313.

I also noticed that the Prometheus scrape endpoint in Spring-Boot returns Content-Type: application/openmetrics-text;version=1.0.0;charset=utf-8 when NRI-Prometheus sends an Accept header of Accept: application/openmetrics-text;version=0.0.1,text/plain;version=0.0.4;q=0.5,*/*;q=0.1

Its ignoring the 0.0.1 version and sending 1.0.0

I don't know enough about how accept headers are supposed to work, to know if that is a bug or intended with the q=0.5,*/*;q=0.1 options

jonatan-ivanov commented 2 years ago

Yeah, that version mismatch is actually the right behavior in Boot and the Prometheus server should not send 0.0.1 (neither NewRelic).

According to the OpenMetrics Spec:

The content type MUST be: application/openmetrics-text; version=1.0.0; charset=utf-8

And according to the Prometheus Java Client: https://github.com/prometheus/client_java/issues/702

See this issue: https://github.com/prometheus/prometheus/pull/9431

I'm just guessing but I think the problem is not the version mismatch but NewRelic asking for openmetrics-text and only being able to process text/plain. NewRelic either should ask for text/plain or be able to parse openmetrics-text (my recommendation).

Btw Telegraf had a similar issue: https://github.com/influxdata/telegraf/issues/10248