kernel logging is broken if syslogd.cache survives a system boot

fjaell commented 3 years ago

this happens if sysklogd is configured with --localstatedir=/var and therefore syslogd.cache is located on /var/run which is not cleaned during boot.

I'll fix this for now by setting runstatedir to /run.

troglobit commented 3 years ago

On most systems that have done the transition to /run, /var/run is a symlink to /run. This is the safe and backwards compatible approach I'd recommend taking considering many applications use _PATH_VARRUN or have /var/run hardcoded for other reasons. That or cleaning /run and /var/run at boot. Having both as separate filesystems does not make sense.

I'm leaning towards closing this as wont-fix.

fjaell commented 3 years ago

but not on all systems /var/run is a symlink to /run and on those system syslogd is broken. At least the special behaviour of syslogd.cache should be noted somewhere. IMO it would be much nicer if the location of the cache file doesn't matter.

troglobit commented 3 years ago

It's really hard to make users of all systems happy, I've tried to follow the intent of the FHS to the best of my abilities. As a maintainer I have to make some hard choices. However, like /tmp, /var/run (and now /run) is expected to be cleaned at boot. Something I'm sure many other system level programs also rely on. Distribution packagers of course need the information, and I'm here to reply to the best of my abilities, but I cannot put every design decision in a document.

The next version of sysklogd will have a command line option to read the path to the cache file, so there's a bit more information about this feature in the man page already.

Nevertheless, syslogd relied on the cache file not persisting across reboots, so the location does matter.

opty77 commented 3 years ago

Slackware -current uses --localstatedir=/var too and system initialization script /etc/rc.d/rc.S contains:

# If /run exists, mount a tmpfs on it (unless the
# initrd has already done so):
if [ -d /run ]; then
  if ! grep -wq "tmpfs /run tmpfs" /proc/mounts ; then
    /sbin/mount -v -n -t tmpfs tmpfs /run -o mode=0755,size=32M,nodev,nosuid,noexec
  fi
  # Make sure that mounts below /run are visible in both /run and /var/run:
  /sbin/mount --make-shared /run
fi

[...]

# Bind mount /run to /var/run:
mount -o bind /run /var/run

Some ideas:

Add e.g. start_after_boot() and/or stop_before_shutdown() functions to your syslogd script and manage syslogd.cache there (hacky).
Add syslogd.cache management to your system initialization script and/or shutdown script (hacky).
syslogd could reset the cache file when its message number exceeds current one (unreliable).
syslogd could compare the cache file last modification timestamp with system boot time and reset the cache file when older (unreliable, ntpd recommended).

troglobit commented 3 years ago

I stand by what I said above, /var/run and /run should be for temporary files for the current boot. So from my perspective it's a system issue. Bind mounting /var/run to /run was clever, I need to tell my colleagues of that trick (we bind mount a lot else, but for some reason got stuck with the Debian/RedHat symlink trick instead).

In theory I like 3) and 4) in @opty77's comment, because it points to a possible issue of wrap-around that I hadn't considered that could occur at runtime as well. However, mtime and system time is nothing I'd like to build a reliable function on -- time skips -- particular at boot. Our embedded systems often start with time at 1970 and the mtime of the cache file could have that date or correct date from a timezone that ntpd still hasn't synced the system to again after boot ...

This whole mess started after the migration from /proc/kmsg to /dev/kmsg, something I at times deeply regret since it caused a lot of regressions. I'd be open to a PR with #ifdefs to disable the new code, including the caching, with a configure option ...

troglobit commented 3 years ago

Commit b0d4e4c updates README and man page, as well as the online help text, with more detailed information about syslogd.cache.

Closing

troglobit / sysklogd

kernel logging is broken if syslogd.cache survives a system boot #40