roadrunner-server / roadrunner-plugins

📦 Home for the roadrunner plugins
MIT License
25 stars 9 forks source link

[🐛 BUG]: The `rr_http_requests_queue` metric type is probably incorrect. #162

Closed victor-sudakov closed 2 years ago

victor-sudakov commented 2 years ago

No duplicates 🥲.

What happened?

I'm afraid the rr_http_requests_queue metrics has incorrect type. It has the type of "summary" while in fact it is a gauge.

Quoting the Prometheus docs

Note that the number of observations (showing up in Prometheus as a time series with a _count suffix) is inherently a counter (as described above, it only goes up). The sum of observations (showing up as a time series with a _sum suffix) behaves like a counter, too, as long as there are no negative observations.

So basically rr_http_requests_queue_sum should never go down, but of course it does, and it has to be used as a gauge in Prometheus/Grafana which is counter-intuitive.

Version

2.6.3

Relevant log output

No response

rustatian commented 2 years ago

Hey @victor-sudakov. Yep, there are differences between Gauge and currently used Summary. The summary was chosen because Gauge panics on negative numbers which I've seen sometimes. So, I wanted to use the safest approach from the stability POV. I'll retest Gauge one more time.

victor-sudakov commented 2 years ago

The summary was chosen because gauge panics on negative numbers which I've seen sometimes.

It is very strange because a gauge is in 64-Bit floating point format which officially supports negative numbers. If it's a bug in Prometheus or in the library, it should be reported.

rustatian commented 2 years ago

It is very strange because a gauge is in 64-Bit floating point format which officially supports negative numbers.

Yep, I know. I need to re-check this, might be that was my issue somewhere. Will be available in 2.7.

rustatian commented 2 years ago

Yeah, I have tested Gauge on high load, no errors were reported. Let's switch this type as semantically more suitable.

rustatian commented 2 years ago

@victor-sudakov Enjoy: https://github.com/spiral/roadrunner-binary/releases/tag/v2.6.5 ))

victor-sudakov commented 2 years ago

Are you sure it's fixed?

/app $ /usr/bin/rr --version
rr version 2.6.5 (build time: 2021-12-14T13:31:00+0000, go1.17.5)
/app $ curl -s http://localhost:9090/metrics | grep queue
# HELP rr_http_requests_queue Total number of queued requests.
# TYPE rr_http_requests_queue summary
rr_http_requests_queue_sum 0
rr_http_requests_queue_count 0
/app $ 
victor-sudakov commented 2 years ago

Related: https://github.com/spiral/roadrunner-binary/issues/151

rustatian commented 2 years ago

Related: spiral/roadrunner-binary#151

It's fixed in the master branch, but I guess by some reason cherry-pick into stable was failed. Ok, sorry, have to re-release.

victor-sudakov commented 2 years ago

And I'm afraid the rr_http_requests_queue_sum is also broken now, it's always zero. So 2.6.5 is kind of between 2 worlds.

rustatian commented 2 years ago

Yep, cherry-pick failed. The new version will be in a moment.

rustatian commented 2 years ago

@victor-sudakov https://github.com/spiral/roadrunner-binary/releases/tag/v2.6.6

victor-sudakov commented 2 years ago

Works as expected in v2.6.6, thank you!