Open fchiorascu opened 5 years ago
While this may seem trivial at first glance, it's a lot more complicated. The combination of unit states and substates is quite a long list.
From systemctl --state=help
:
Available unit load states:
stub
loaded
not-found
error
merged
masked
Available unit active states:
active
reloading
inactive
failed
activating
deactivating
Available automount unit substates:
dead
waiting
running
failed
Available device unit substates:
dead
tentative
plugged
Available mount unit substates:
dead
mounting
mounting-done
mounted
remounting
unmounting
remounting-sigterm
remounting-sigkill
unmounting-sigterm
unmounting-sigkill
failed
Available path unit substates:
dead
waiting
running
failed
Available scope unit substates:
dead
running
abandoned
stop-sigterm
stop-sigkill
failed
Available service unit substates:
dead
start-pre
start
start-post
running
exited
reload
stop
stop-sigabrt
stop-sigterm
stop-sigkill
stop-post
final-sigterm
final-sigkill
failed
auto-restart
Available slice unit substates:
dead
active
Available socket unit substates:
dead
start-pre
start-chown
start-post
listening
running
stop-pre
stop-pre-sigterm
stop-pre-sigkill
stop-post
final-sigterm
final-sigkill
failed
Available swap unit substates:
dead
activating
activating-done
active
deactivating
deactivating-sigterm
deactivating-sigkill
failed
Available target unit substates:
dead
active
Available timer unit substates:
dead
waiting
running
elapsed
failed
In order to do this correctly, we have to expand the current state bitmask into the full combination of sub-states. Even with this help info, the valid state + sub-state combinations aren't mapped. For example is failed
+ running
a valid combination?
We also need to detect which type of unit each one is and only expose the sub-states that are valid for that type.
This might make a better separate metric, node_systemd_unit_substate
. This would simplify dealing with the valid combinations.
Sounds great, thank you for this detailed explanation. At the begining, I was thinking if possible to have only two scenarios like:
node_systemd_unit_state{alias="server01",env="int",instance="192.168.11.11:9100",job="server01",name="node_exporter.service",state="active",substate="running",type="simple"}
and
To put under substate="failed" all the substates != substate="running". {alias="server01",env="int",instance="192.168.11.11:9100",job="server01",name="node_exporter.service",state="active",substate="failed",type="simple"}
But what are you detailed I think makes more sense.
Hi @discordianfish any news? :)
Not that I'm aware of. We're open for submissions to implement that but I don't think anyone has done something to address this.
Hi @discordianfish , @SuperQ, maybe in future releases of node_exporter will have this.
I think we're open to including this so if you want to implement this, we'll consider it
I am interested in discussing this issue. The status of my system is as follows.
[root@localhost ~]# systemctl is-enabled node_exporter
disabled
[root@localhost ~]# systemctl list-units --type service
UNIT LOAD ACTIVE SUB DESCRIPTION
● node_exporter.service loaded failed failed Prometheus Node Exporter
[root@localhost ~]# systemctl status node_exporter
● node_exporter.service - Prometheus Node Exporter
Loaded: loaded (/etc/systemd/system/node_exporter.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2023-03-21 19:28:28 KST; 1 day 15h ago
Main PID: 8572 (code=exited, status=1/FAILURE)
...
In this status, an alert is triggerd by the following rule, which we do not want.
node_systemd_unit_state{state="failed",type!="oneshot"} == 1
It would be good if we could prevent the alert using expressions like:
node_systemd_unit_state{state!="disabled",substate="failed",type!="oneshot"} == 1
node_systemd_unit_state{state!="inactive",substate="failed",type!="oneshot"} == 1
Host operating system: output of
uname -a
Linux server01 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
node_exporter version: output of
node_exporter --version
node_exporter, version 0.18.1 (branch: HEAD, revision: 3db77732e925c08f675d7404a8c46466b2ece83e) build user: root@b50852a1acba build date: 20190604-16:41:18 go version: go1.12.5
node_exporter command line flags
Are you running node_exporter in Docker?
N/A
What did you do that produced an error?
What did you expect to see?
It will be great to see the "high-level unit activation state" (ACTIVE) and "low-level unit activation state" (SUB) as labels on metric: node_systemd_unit_state (for the moment there is only the state without substate), below I've added the label.
node_systemd_unit_state{alias="server01",env="int",instance="192.168.11.11:9100",job="server01",name="node_exporter.service",state="active",substate="running",type="simple"}
What did you see?
node_systemd_unit_state{alias="server01",env="int",instance="192.168.11.11:9100",job="server01",name="node_exporter.service",state="active",type="simple"}