openresty / lua-resty-upstream-healthcheck

Health Checker for Nginx Upstream Servers in Pure Lua
521 stars 134 forks source link

Improved healthcheck values for prometheus #91

Closed jonasbadstuebner closed 2 years ago

jonasbadstuebner commented 2 years ago

closes #90 As discussed in the above mentioned issue, this PR adds health check values like so:

nginx_upstream_status_info{name="unknown.com",status="UP"} 0
nginx_upstream_status_info{name="unknown.com",status="DOWN"} 0
nginx_upstream_status_info{name="unknown.com",status="UNKNOWN"} 1
nginx_upstream_status_info{name="foo.com",status="UP"} 1
nginx_upstream_status_info{name="foo.com",status="DOWN"} 0
nginx_upstream_status_info{name="foo.com",status="UNKNOWN"} 0
nginx_upstream_status_info{name="foo.com",endpoint="127.0.0.1:12354",status="UP",role="PRIMARY"} 0
nginx_upstream_status_info{name="foo.com",endpoint="127.0.0.1:12354",status="DOWN",role="PRIMARY"} 1
nginx_upstream_status_info{name="foo.com",endpoint="127.0.0.1:12355",status="UP",role="PRIMARY"} 1
nginx_upstream_status_info{name="foo.com",endpoint="127.0.0.1:12355",status="DOWN",role="PRIMARY"} 0
nginx_upstream_status_info{name="foo.com",endpoint="127.0.0.1:12357",status="UP",role="PRIMARY"} 1
nginx_upstream_status_info{name="foo.com",endpoint="127.0.0.1:12357",status="DOWN",role="PRIMARY"} 0
nginx_upstream_status_info{name="foo.com",endpoint="127.0.0.1:12356",status="UP",role="BACKUP"} 0
nginx_upstream_status_info{name="foo.com",endpoint="127.0.0.1:12356",status="DOWN",role="BACKUP"} 1

It's checking the upstreams first, then the primary peers and the backup peers come last. It puts a 1 where the exposed metric is representing the current status of the checked target and a 0 to all the other statuses, so you always have all the metrics and you have no gaps in your metrics.

Also I fixed the sanity tests and improved the make install command and split up the prometheus tests to improve debugging abilities.

jonasbadstuebner commented 2 years ago

Checks are passing now. I improved the style if the pipeline too, to see the error.log in case a test fails.

jonasbadstuebner commented 2 years ago

It would fit the code before my changes better if I would change the max line width to 76, but I find this a weird number, so I suggest we keep it at 75.

jonasbadstuebner commented 2 years ago

I'm requesting your help because the pipeline does not do what I understand it should. The error it is throwing is not based on the current state of the code. E.g. 127.0.0.1:12354 up is written nowhere in the sanity.t file. Where does the pipeline get this from?

From pipeline logs:

#      Primary Peers
# -        127.0.0.1:12354 up
# -        127.0.0.1:12355 up
# +        127.0.0.1:12354 UP
# +        127.0.0.1:12355 UP
jonasbadstuebner commented 2 years ago

Please review this again.