Open levindecaro opened 2 years ago
Yeah I see how this would be useful
@levindecaro Your expected metric would be inaccurate, because it's not the whole md125 array that has been removed, but rather just one of the component devices. From the output of your mdadm command, the md125 array is still functioning (and would continue to do so, since it's raid1 and still has one leg working).
What you instead need is a metric for the state of individual component devices if you want to see if they have been removed.
However, you could also have alerted on the condition that you encountered with a node_md_disks{state="failed"} > 0
alerting rule. Alternatively, node_md_disks_required - node_md_disks{state="active"} > 0
would probably also do the trick.
Having said that, the existing implementation of the procfs library's parsing of /proc/mdstat
masks some of the low-level details and this is why I have proposed a new direction with https://github.com/prometheus/procfs/pull/509.
Host operating system: output of
uname -a
Linux sds-3 4.18.0-305.7.1.el8_4.x86_64 #1 SMP Tue Jun 29 21:55:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
node_exporter version: output of
node_exporter --version
node_exporter, version 1.3.1 (branch: HEAD, revision: a2321e7b940ddcff26873612bccdf7cd4c42b6b6) build user: root@243aafa5525c build date: 20211205-11:09:49 go version: go1.17.3 platform: linux/amd64
node_exporter command line flags
Are you running node_exporter in Docker?
no
What did you do that produced an error?
mdadm -D output
What did you expect to see?
node_md_state{device="md125", instance="sds-3", job="sds-nodes", state="removed"}
What did you see instead?
"removed" state metric not yet implemented in node_md_state