hvisage / metalog

metalog is a syslog replacement that allows regular expresion matching
GNU General Public License v2.0
36 stars 11 forks source link

Arch Linux, metalog 20220214: systemd service stops with error into failed state #30

Open angrysquirell opened 1 year ago

angrysquirell commented 1 year ago

My only concern is to catch events and send them as email notification.

It's standard installation from repository, pkgbuild and service unit file are available here: https://gitlab.archlinux.org/archlinux/packaging/packages/metalog

Running metalog with -v parameter does not produce any additional information. The only logged messages after systemctl stop metalog.service are: lip 25 22:06:58 arch systemd[1]: Stopping System Logger Daemon... lip 25 22:06:58 arch systemd[1]: metalog.service: Main process exited, code=exited, status=1/FAILURE lip 25 22:06:58 arch systemd[1]: metalog.service: Failed with result 'exit-code'. lip 25 22:06:58 arch systemd[1]: Stopped System Logger Daemon.

Configuration file, as follows, nothing special, "accepted" for testing purposes:

maxsize = 1048576 maxtime = 86400 maxfiles = 5

ssh : program = "sshd" # regex = "preauth" regex = "accepted" logdir = "NONE" command = "/usr/local/bin/metalog-email"

Logging itself works good, but how to diagnose and avoid systemd service stopping errors?

m-kress commented 1 year ago

Metalog always returns 1 when it gets killed by SIGTERM or SIGQUIT. Obviously it was presumed, that the system logger should always run after it has been started. Any stopping is considerd as an error. Eventually this could be changed.

The other thing is: What would you expect metalog to do? In your case metalog gets stopped, so who should listen to the messages metalog would produce? STDOUT? Or do you run a second syslogger?

angrysquirell commented 1 year ago

Thank you very much for the response.

Metalog always returns 1 when it gets killed by SIGTERM or SIGQUIT. Obviously it was presumed, that the system logger should always run after it has been started. Any stopping is considerd as an error. Eventually this could be changed.

I undestand it, but I've just noticed that metalog.service does not exit with exit 0 / success (in contrast to syslog-ng and systemd-journald - examples below). Of course I don't need to manually stop / turn off the service - it should monitor logs from system start to poweroff. I'm just curious if it is an metalog error, my configuration error or handling metalog.service by systemd while stopping the service.

Error appears also before restart / poweroff while systemd is stopping service.

So if I understood correctly, this metalog's behavior is absolutely normal by desing and it's not to worry about?

Examples:

  1. Metalog stopping "error" before reboot / poweroff: journalctl -q -b -1 -u metalog lip 26 19:00:46 varch systemd[1]: Started System Logger Daemon. lip 26 19:03:54 varch systemd[1]: Stopping System Logger Daemon... lip 26 19:03:54 varch systemd[1]: metalog.service: Main process exited, code=exited, status=1/FAILURE lip 26 19:03:54 varch systemd[1]: metalog.service: Failed with result 'exit-code'. lip 26 19:03:54 varch systemd[1]: Stopped System Logger Daemon.

  2. Manually stopped syslog-ng service exit with 0 / success: systemctl status syslog-ng@default.service -l --no-pager ○ syslog-ng@default.service - System Logger Daemon "default" instance Loaded: loaded (/usr/lib/systemd/system/syslog-ng@.service; enabled; preset: disabled) Drop-In: /etc/systemd/system/syslog-ng@default.service.d └─override.conf Active: inactive (dead) since Wed 2023-07-26 19:11:28 CEST; 3min 30s ago Duration: 14min 56.433s Docs: man:syslog-ng(8) Process: 887 ExecStart=/usr/bin/syslog-ng -F $OTHER_OPTIONS --cfgfile $CONFIG_FILE --control $CONTROL_FILE --persist-file $PERSIST_FILE --pidfile $PID_FILE (code=exited, status=0/SUCCESS) Main PID: 887 (code=exited, status=0/SUCCESS) Status: "Shutting down... (Wed Jul 26 19:11:28 2023" CPU: 255ms

  3. Manually stopped systemd-journald service also exited with 0 / success: systemctl status systemd-journald ○ systemd-journald.service - Journal Service Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static) Active: inactive (dead) since Wed 2023-07-26 19:18:01 CEST; 6s ago Duration: 21min 41.315s TriggeredBy: ● systemd-journald.socket ● systemd-journald-dev-log.socket Docs: man:systemd-journald.service(8) man:journald.conf(5) Process: 252 ExecStart=/usr/lib/systemd/systemd-journald (code=exited, status=0/SUCCESS) Main PID: 252 (code=exited, status=0/SUCCESS) Status: "Processing requests..." CPU: 240ms

The other thing is: What would you expect metalog to do? In your case metalog gets stopped, so who should listen to the messages metalog would produce? STDOUT? Or do you run a second syslogger?

I'm using syslog-ng only as auxiliary monitoring tool, that can send warning by email. The main logger is Saint Systemd-Jounald Almighty - Daemon of the Century, which cannot talk to local postfix relay instalnce and send something to email :-D

Because of some recent syslog-ng maintenance issues, difficulties, etc., I've just tried to setup metalog.

So generally, this kind of exit status while service stopping may be a bit misleading. I'm rather quite experienced user, so that is the reason of my first post.

Maybe there is some way to hide / suppress exit error code to have clean logs without this kind of "false-positive" error?

Best regards!

m-kress commented 1 year ago

Metalog obviously seems to exit with 1 on purpose after being killed. As mentioned: I don't know the idea behind that, but only can assume, that the original authors of metalog didn't have your use case in mind.

It would be quite easy to change that behaviour into exit with 0. Could we have a problem with compatibility or other regressions by changing that?

angrysquirell commented 1 year ago

Metalog obviously seems to exit with 1 on purpose after being killed. As mentioned: I don't know the idea behind that, but only can assume, that the original authors of metalog didn't have your use case in mind.

It's not any special use case. It's normal service "working" from system start to poweroff / reboot.

It's probably caused by security reasons, as warning when someone would stop logging daemon, nevermind.

Maybe good choice would be exit 0 / success status only triggered by system's init ( /proc/1/comm ), sysvinit, runit, systemd, etc.

The other thing is that messages catched by metalog, were not appear in systemd-journal simultaneously (ForwartToSyslog=yes, enabled in journald.conf). I suppose that may be difficult to implement it, but would be usable.

I know about KISS rules, but somehow we are doomed to using systemd-journal...

Thank you for your effort.

orbea commented 1 year ago

There are different kill(1) signals, presumably metalog could pay attention to how its being killed and give the correct exit code accordingly.