rabbitmq / rabbitmq-prometheus

A minimalistic Prometheus exporter of core RabbitMQ metrics
Other
145 stars 109 forks source link

Telegraf prometheus input fails with 406 Not Acceptable from Prometheus exporter endpoint #12

Closed 00Asgaroth00 closed 4 years ago

00Asgaroth00 commented 4 years ago

Hi,

I am trying to scrape the prometheus metrics from a RMQ 3.8.0 cluster using telegraf's prometheus input plugin.

When telegraf attempts to connect to the metrics endpoint on RMQ, RMQ responds with 406 Not Acceptable.

Below is a trace of the request/response from the client/server:

Request (Telegraf Prometheus Input Plugin):

T 2019/10/14 12:25:02.854993 10.6.0.227:38254 -> 10.6.0.249:15692 [AP]
GET /metrics HTTP/1.1
Host: 10.6.0.249:15692
User-Agent: Go-http-client/1.1
Accept: application/vnd.google.protobuf;proto=io.prometheus.client.MetricFamily;encoding=delimited;q=0.7,text/plain;version=0.0.4;q=0.3
Accept-Encoding: gzip
Connection: close

Response (RMQ /metrics endpoint)

T 2019/10/14 12:25:02.856160 10.6.0.249:15692 -> 10.6.0.227:38254 [AP]
HTTP/1.1 406 Not Acceptable
connection: close
content-length: 0
date: Mon, 14 Oct 2019 11:25:02 GMT
server: Cowboy

The logs from the telegraf client are as follows:

2019-10-14T11:25:02Z E! [inputs.prometheus]: Error in plugin: http://10.6.0.249:15692/metrics returned HTTP status 406 Not Acceptable
2019-10-14T11:25:10Z D! [outputs.file] buffer fullness: 0 / 10000 metrics. 
2019-10-14T11:25:14Z E! [inputs.prometheus]: Error in plugin: http://10.6.0.249:15692/metrics returned HTTP status 406 Not Acceptable
2019-10-14T11:25:20Z D! [outputs.file] buffer fullness: 0 / 10000 metrics. 
2019-10-14T11:25:20Z E! [inputs.prometheus]: Error in plugin: http://10.6.0.249:15692/metrics returned HTTP status 406 Not Acceptable
2019-10-14T11:25:30Z D! [outputs.file] buffer fullness: 0 / 10000 metrics. 

Would this have something to do with the "Accept:" header in the initial request? Would it be possible to add support for the prometheus plugin in telegraf?

Thanks for adding the metrics endpoint for monitoring!! If you require any additional information please respond with what you need from me.

Thanks!

EDIT: Link to Telegraf Prometheus Input Plugin

michaelklishin commented 4 years ago

Thank you for your time.

Team RabbitMQ uses GitHub issues for specific actionable items engineers can work on. GitHub issues are not used for questions, investigations, root cause analysis, discussions of potential issues, etc (as defined by this team).

We get at least a dozen of questions through various venues every single day, often light on details. At that rate GitHub issues can very quickly turn into a something impossible to navigate and make sense of even for our team. Because GitHub is a tool our team uses heavily nearly every day, the signal/noise ratio of issues is something we care about a lot.

Please post this to rabbitmq-users.

Thank you.

michaelklishin commented 4 years ago

The plugin provides text/plain and your client claims to only accept application/vnd.google.protobuf.

michaelklishin commented 4 years ago

Relevant Prometheus doc guide issue: https://github.com/prometheus/docs/issues/927. The Telegraf input plugin must be adapted for Prometheus 2.x.

00Asgaroth00 commented 4 years ago

OK, thank you, I'll open a ticket with telegraf client then so that they can support prometheus v2.x

00Asgaroth00 commented 4 years ago

Hi,

I've been informed on my ticket with the telegraf github repository that the following query to rabbitmq-prometheus should match, however, I am still getting a 406 not acceptable response with the query below:

# curl -v -H "Accept:text/plain;version=0.0.4" "http://10.6.0.249:15692/metrics"
* About to connect() to 10.6.0.249 port 15692 (#0)
*   Trying 10.6.0.249... connected
* Connected to 10.6.0.249 (10.6.0.249) port 15692 (#0)
> GET /metrics HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.27.1 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: 10.6.0.249:15692
> Accept:text/plain;version=0.0.4
> 
< HTTP/1.1 406 Not Acceptable
< content-length: 0
< date: Tue, 15 Oct 2019 08:16:34 GMT
< server: Cowboy
< 
* Connection #0 to host 10.6.0.249 left intact
* Closing connection #0

They are saying that it may still be an issue with rabbitmq-prometheus.

michaelklishin commented 4 years ago

This plugin used in numerous environments with Prometheus 2. If it's good enough for the Prometheus scraper client, we consider it to be good enough, period.

The API provides text/plain and your example accepts text/plain;version=0.0.4. Quite predictably,

curl -v -H "Accept:text/plain" "http://localhost:15692/metrics"

works like a charm.

michaelklishin commented 4 years ago

The Telegraf issue links to https://prometheus.io/docs/instrumenting/exposition_formats/#basic-info which means that a version specifier can be provided as an MIME type parameter.

michaelklishin commented 4 years ago

It's a matter of relaxing content type subtypes and parameters accepted.

michaelklishin commented 4 years ago

Backported https://github.com/rabbitmq/rabbitmq-prometheus/commit/d639c807d4b7767bfe28e78d9e89b4560160b829 to v3.8.x.

00Asgaroth00 commented 4 years ago

This plugin used in numerous environments with Prometheus 2. If it's good enough for the Prometheus scraper client, we consider it to be good enough, period.

The telegraf issue found that the prometheus scraper uses the following accept header:

const acceptHeader = `application/openmetrics-text; version=0.0.1,text/plain;version=0.0.4;q=0.5,*/*;q=0.1`

I'm wondering how that even worked with this plugin if you were not accepting parameters to the text/plain content type.

According to the above prometheus accept header there are 3 content-types accepted by the client, one of them being text/plain, does the above patch allow querying with an accept header that has multiple content types supported, one of them being the one you accept?

For example, does this patch allow the following queries to succeed as they both provide the text/plain content type as an acceptable response?:

Using Prometheus scraper accept header

curl -v -H "Accept: application/openmetrics-text; version=0.0.1,text/plain;version=0.0.4;q=0.5,*/*;q=0.1" "http://localhost:15692/metrics"

Using the "current" Telegraf prometheus scrape client accept header (does need updating to the above accept header)

curl -v -H "Accept: application/vnd.google.protobuf;proto=io.prometheus.client.MetricFamily;encoding=delimited;q=0.7,text/plain;version=0.0.4;q=0.3" "http://localhost:15692/metrics"

Do you have docker images that we can use to test changes before release, this will allow me to do these test for you.

Again, thank you for looking into this, your time is much appreciated.

lukebakken commented 4 years ago

Do you have docker images that we can use to test changes before release, this will allow me to do these test for you.

We have alpha releases of RabbitMQ available on bintray. You probably want 3.8.1-alpha.21 when it becomes available: https://bintray.com/rabbitmq/all-dev/rabbitmq-server

I have built this plugin using Erlang 21.3:

rabbitmq_prometheus-3.8.0-alpha.859-2019.10.16.ez.zip

Remove the .zip extension and place in the same directory as the existing rabbitmq_prometheus .ez file. Move the old file to another location, and re-start RabbitMQ.

michaelklishin commented 4 years ago

3.8.1-alpha.20 includes this change.

00Asgaroth00 commented 4 years ago

Thanks all!

I have tested the "rabbitmq_prometheus-3.8.0-alpha.859-2019.10.16.ez.zip" version of the plugin and all appears to be working as expected, the telegraf client is able to scrape with it's current accept header.

I tested the above curl calls too against this version of the plugin and they all appear to be responding.

The reason the prometheus scraper was working all along is because of the */* fallback content type (accept anything from the uas).

Thanks all for your time and fix, much appreciated!

lukebakken commented 4 years ago

@00Asgaroth00 thank you for the report and for so quickly testing the fix in your environment.