openresty / lua-resty-upstream-healthcheck

Health Checker for Nginx Upstream Servers in Pure Lua
515 stars 134 forks source link

PR offer: Making Prometheus metrics more adaptable #90

Closed jonasbadstuebner closed 1 year ago

jonasbadstuebner commented 2 years ago

I offer to make a PR for this library, I only want to ask for the direction of it beforehand.

The goal

At the end, no matter what, we need metrics looking like

nginx_upstream_status_info{name=\"%s\",endpoint=\"%s\",role=\"%s\"} %d

instead of

nginx_upstream_status_info{name=\"%s\",endpoint=\"%s\",status=\"%s\",role=\"%s\"} 1

=> I will remove the status tag and represent the current status by the value only. That's the way, e.g. the haproxy_server_status metric is implemented and it is way easier to graph it and read the graphs, if the values differ instead of the tags. Also it's easier to alert on that. So this is the end result, to have a way to get metrics looking like above from this library.

Now the question is, what is the, from your side, preferred way (both options will be implemented non-breaking): A) I implement an opts parameter for the prometheus_status_page() --> st, err = hc.prometheus_status_page{ up_value = 1, down_value = 0, unknown_value = -1, include_status_tag = false } B) I implement a second metric method to call if the status should change by value, not by tag --> st, err = hc.prometheus_status_page_by_value()

Both would work, A) would create more flags and more parameters on internal functions, B) would create new functions and "duplicate" some code I guess. There is still C), where I just implement the way we need it as a breaking change, but I guess this is the less preferred way?

Please decide on one of these.

Best regards!

zhuizhuhaomeng commented 2 years ago

@Freggy need your advice.

freggy commented 2 years ago

To make it more consistent with HAProxy, we should do the following, which is probably not even a breaking change.

Let's say we have 2 upstreams A and B. If everything is healthy it should look like this:

nginx_upstream_status_info{name="A",endpoint="metrics",role="PRIMARY", status="UP"} 1
nginx_upstream_status_info{name="A",endpoint="metrics",role="PRIMARY", status="DOWN"} 0
nginx_upstream_status_info{name="B",endpoint="metrics",role="PRIMARY", status="UP"} 1
nginx_upstream_status_info{name="B",endpoint="metrics",role="PRIMARY", status="DOWN"} 0

If A is DOWN then (iam just going to leave out B for better readability)

nginx_upstream_status_info{name="A",endpoint="metrics",role="PRIMARY", status="UP"} 0
nginx_upstream_status_info{name="A",endpoint="metrics",role="PRIMARY", status="DOWN"} 1
...

so basically export all states, but set the value of the one which is "active" to 1. This is exactly the HAProxy behavior. See this issue here: https://github.com/haproxy/haproxy/issues/1029 (Commit https://github.com/haproxy/haproxy/commit/de3c32638925c2071a5a84cbdafe2f112d2c4261).

jonasbadstuebner commented 2 years ago

@zhuizhuhaomeng Please confirm and I will implement it like this and open a linked PR.

zhuizhuhaomeng commented 2 years ago

@DrBu7cher I rarely use Prometheus. So it's not appropriate for me to make the decision. According to @Freggy, we still need the status tag. Do you agree with @Freggy ?

jonasbadstuebner commented 2 years ago

Maybe I should have mentioned, that I work together with @Freggy. I agree with what he said and will later open a PR to implement that.

zhuizhuhaomeng commented 2 years ago

The PR is welcomed. @DrBu7cher

jonasbadstuebner commented 2 years ago

It's there, please approve it. Thank you in advance.