joonty / systemd_mon

Monitor for systemd to alert failed services
MIT License
106 stars 28 forks source link

possible false positives on oneshot type services without 'RemainAfterExit=yes' #6

Open glitsj16 opened 8 years ago

glitsj16 commented 8 years ago

First, a word of gratitude for this systemd monitoring app. In all honesty, I was using https://github.com/gkarakou/systemd-denotify for quite a while on desktops, but recently I was looking for something more geared towards servers when stumbling onto this project. Generally works great for my use cases.

Only one issue so far: when setting up a systemd unit of type 'oneshot' that doesn't have 'RemainAfterExit=yes' (like the default logrotate.service on archlinux) I see some errors and systemd_mon (erroneously) notifies via email. No clue if this is expected behaviour or a bug, my systemd knowledge is far from 'developed'.. It could be related to the recently integrated pull request supporting oneshot type services (#3), but as that happened before I started to use systemd_mon this is something I cannot judge.

What happens on the logrotate.service unit:

(1) without RemainAfterExit=yes --> Active: inactive (dead)
      ==> systemd_mon starts notifying (repeatedly)
      unexpected: systemctl --failed --all doesn't report anything for logrotate.service

(2) with RemainAfterExit=yes --> Active: active (exited)
      ==> systemd_mon doesn't notify
     expected

I have worked around this issue by adding /etc/systemd/system/logrotate.service, which differs only in the 'RemainAfterExit=yes' part. Yet it might prove useful to report this issue, hope it doesn't cause too much confusion :-)

Some debug info on the issue:

$ cat /usr/lib/systemd/system/logrotate.service [Unit] Description=Rotate log files

[Service] Type=oneshot ExecStart=/usr/bin/logrotate /etc/logrotate.conf Nice=19 IOSchedulingClass=best-effort IOSchedulingPriority=7

$ sudo systemctl --failed 0 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'.

$ sudo systemctl status -l logrotate.service ● logrotate.service - Rotate log files Loaded: loaded (/usr/lib/systemd/system/logrotate.service; static; vendor preset: disabled) Active: inactive (dead) since Mon 2015-12-07 21:39:04 UTC; 41min ago Main PID: 28010 (code=exited, status=0/SUCCESS)

Dec 07 21:39:04 do16 systemd[1]: Starting Rotate log files... Dec 07 21:39:04 do16 systemd[1]: Started Rotate log files.

$ systemd_mon ~/.systemd_mon_testing.yml SystemdMon::Notifiers::Email doesn't respond to 'notify_start!', not sending notification Monitoring changes to 13 units

Using notifiers: SystemdMon::Notifiers::Email

SystemdMon::State:0x00000002e70228

SystemdMon::State:0x000000027a8300

SystemdMon::State:0x00000001dcf360

SystemdMon::State:0x00000001ca02f0

SystemdMon::State:0x00000001330918

logrotate.service failed: inactive (dead) Uncaught exception (NoMethodError) in callback: undefined method `first' for #SystemdMon::StateValue:0x00000001329d48

SystemdMon::State:0x00000001d5ab00

SystemdMon::State:0x00000001e866f0

SystemdMon::State:0x000000027e11a0

SystemdMon::State:0x00000002920ed0

SystemdMon::State:0x00000002a9d808

SystemdMon::State:0x00000002b8ac20

SystemdMon::State:0x00000002c60028

SystemdMon::State:0x00000002d21f98 [*]

logrotate.service still failed: inactive (dead) active state changed from inactive to inactive then activating then inactive

Notifying state change of logrotate.service via SystemdMon::Notifiers::Email SystemdMon::Notifiers::Email: Sending email to glitsj16@gmail.com: SystemdMon::Notifiers::Email: -> Subject: "Alert: logrotate.service on do16: still failed" SystemdMon::Notifiers::Email: -> Message: "Systemd unit logrotate.service on do16 still failed: inactive (dead)


| Time | Active |


| 22:29:48.815 +0000 | inactive |


| 22:30:14.110 +0000 | inactive |


| 22:30:14.114 +0000 | activating |


| 22:31:17.794 +0000 | inactive |


Regards, SystemdMon" SystemdMon::Notifiers::Email: sent email notification

SystemdMon::State:0x00000002d1b1e8

[*] running commands in another terminal window $ sudo systemctl start logrotate.service $ sudo systemctl status -l logrotate.service ● logrotate.service - Rotate log files Loaded: loaded (/usr/lib/systemd/system/logrotate.service; static; vendor preset: disabled) Active: inactive (dead) since Mon 2015-12-07 22:25:20 UTC; 9s ago Process: 28278 ExecStart=/usr/bin/logrotate /etc/logrotate.conf (code=exited, status=0/SUCCESS) Main PID: 28278 (code=exited, status=0/SUCCESS)

Dec 07 22:25:20 do16 systemd[1]: Starting Rotate log files... Dec 07 22:25:20 do16 systemd[1]: Started Rotate log files.

mariomarin commented 8 years ago

I am using systemd timers for backup and I want to know when a backup fails, I put all the units in a target directory for each timer and its units start at the same time. Since the service runs as Type=oneshot the output code=exited, status=0/SUCCESS is fine and should be recognized as valid. I can't use RemainAfterExit=yes, the workaround sugested here, because the timer won't start because systemd thinks that is still active.

glitsj16 commented 8 years ago

That makes sense. Thanks for clearing things up.

mariomarin commented 8 years ago

@glitsj16 I think this issue is valid and should remain open. I was getting fase positives for units with type oneshot, that's why I try to use the workaround you suggested, but it blocked my timer's units.

glitsj16 commented 8 years ago

@mariomarin Reopened. I'm actually having a closer look at systemd and will have plenty of time to experiment in a couple of days. Looking forward to share insights, systemd timers are on my to-do list, for now I still use old-skool cronjobs left and right. Enjoy the holidays.

mariomarin commented 8 years ago

Thanks