Closed acpana closed 1 month ago
Hi @FFMMM, just recently upgraded to consul 1.10.3 and found out a similar problem with another set of metrics:
The replication metrics:
consul_leader_replication_<item>_status
Are left to 1 if the leader changes. I think they should get back to 0 in that case so only the leader reports 1 if everything is replicating correclty.
If you think I should open another issue i'll do it.
Thanks.
I believe a similar problem exists with consul.raft.state.candidate
, consul.raft.state.follower
, and consul.raft.state.leader
. They should not be reported once a server changes state, but because we don't expire them or explicitly set them to NaN, a single server can report more than one of these states.
Overview
When moving from 1.9 to 1.10, some metrics changed their behavior. Full context and discovery here https://github.com/hashicorp/consul/issues/10730
Repro steps:
Check out any consul release >= 1.10.0
where
upstream
is a remote set togit@github.com:hashicorp/consul
;Build
consul
binary;Run an agent in
dev
mode with the following configuration file to turn on prometheus style metrics.The
cconfig.json
file configures prometheus retention policy:format=prometheus
in the request body...
HELP consul_autopilot_failure_tolerance Tracks the number of voting servers that the cluster can lose while continuing to function.
TYPE consul_autopilot_failure_tolerance gauge
consul_autopilot_failure_tolerance 0 # <-- this value should be NaN ...
One can generate a similar list by diff-ing the cURL command above of a
1.9.x
consul release and any release >=1.10.x
.Originally posted by @FFMMM in https://github.com/hashicorp/consul/issues/10730#issuecomment-929844513
Possible outcome
`metric name | emitted by (server-leader, server-follower, client) | possible options and what they mean