systemd / systemd

The systemd System and Service Manager
https://systemd.io
GNU General Public License v2.0
13.28k stars 3.8k forks source link

journalctl doesn't follow directories properly #359

Open sschindler89 opened 9 years ago

sschindler89 commented 9 years ago

If you follow a directory via

$journalctl --directory /some/dir/logs/ --follow

which contains journal files from different sources (I use a tool which uses systemd-journal-remote to create remote logs), one source tends to "overshadow" the others in the journalctl view. Meaning: the logs of the other sources do not appear in the journalctl unless you filter directly to those messages (for example via the machine-id of one overshadowed source).

An Example: You have two log sources: HOST1, HOST2. Both send logs (src1.journal and src2.journal) to a third system SINK. On SINK the logs are stored in a central directory /home/user/remotelogs/. To read those remote logs I wanted to use journalctl. Unfortunately new logs from HOST2 do not appear after the first message of HOST1 is received. Below you'll find two blocks you can c/p. This is a minimal sample that should reproduce the error on machines that have systemd-journal-remote usable. In the journalctl view you should only see one entry (If you reopen the journalctl on this directory you'll see all three entries).

First: Set up the directory and start watching it with journalctl:

mkdir ~/logtest
journalctl --directory ~/logtest -f

Second: Log messages into the directory in two different files:

echo "__CURSOR=s=e57e7a75e65a4e94a591160fc625a233;i=4f0db4;b=be5c25d1f7fb499685a2240b9c9a9daa;m=a148941af;t=519448bea832b;x=1847ebd39f6549e2
__REALTIME_TIMESTAMP=1435157079425835
__MONOTONIC_TIMESTAMP=43294212527
_BOOT_ID=be5c25d1f7fb499685a2240b9c9a9daa
_TRANSPORT=journal
_UID=0
_GID=0
_CAP_EFFECTIVE=3fffffffff
_MACHINE_ID=a79960883b1343f79fadd23bfd730d98
_HOSTNAME=HOST1
_SYSTEMD_SLICE=system-dropbear.slice
MESSAGE=src 1
CODE_FILE=jlog.c
CODE_LINE=24
CODE_FUNC=main
SYSLOG_IDENTIFIER=jlog
_COMM=jlog
_EXE=/usr/bin/jlog
_CMDLINE=jlog MESSAGE=src 1
_SYSTEMD_CGROUP=/system.slice/system-dropbear.slice/dropbear@3-192.168.13.252:22-192.168.13.226:54996.service
_SYSTEMD_UNIT=dropbear@3-192.168.13.252:22-192.168.13.226:54996.service
_PID=19911
_SOURCE_REALTIME_TIMESTAMP=1435157079425635

" | /lib/systemd/systemd-journal-remote -o ~/logtest/src1.journal -
echo "__CURSOR=s=3d4afc211fca422393fe1b7262c8da74;i=b3e1f;b=2d4466c33b7a4942b8ca11dcaeb4974c;m=89637e1f8;t=51944824bc9d3;x=e882c12d43b9a4ee
__REALTIME_TIMESTAMP=1435156918028755
__MONOTONIC_TIMESTAMP=36879983096
_BOOT_ID=2d4466c33b7a4942b8ca11dcaeb4974c
CODE_FILE=jlog.c
CODE_LINE=24
CODE_FUNC=main
SYSLOG_IDENTIFIER=jlog
_TRANSPORT=journal
_UID=0
_GID=0
_COMM=jlog
_EXE=/usr/bin/jlog
_CAP_EFFECTIVE=3fffffffff
_SYSTEMD_CGROUP=/system.slice/system-dropbear.slice/dropbear@1-192.168.13.253:22-192.168.13.226:55910.service
_SYSTEMD_UNIT=dropbear@1-192.168.13.253:22-192.168.13.226:55910.service
_SYSTEMD_SLICE=system-dropbear.slice
_MACHINE_ID=9c18637da35643bd83793ff2b5a5a965
_HOSTNAME=HOST2
MESSAGE=src 2
_CMDLINE=jlog MESSAGE=src 2
_PID=31281
_SOURCE_REALTIME_TIMESTAMP=1435156918028542

" | /lib/systemd/systemd-journal-remote -o ~/logtest/src2.journal -
echo "__CURSOR=s=3d4afc211fca422393fe1b7262c8da74;i=b410f;b=2d4466c33b7a4942b8ca11dcaeb4974c;m=8efd09862;t=51944dbe4803d;x=b5b9636567f8ea00
__REALTIME_TIMESTAMP=1435158421209149
__MONOTONIC_TIMESTAMP=38383163490
_BOOT_ID=2d4466c33b7a4942b8ca11dcaeb4974c
CODE_FILE=jlog.c
CODE_LINE=24
CODE_FUNC=main
SYSLOG_IDENTIFIER=jlog
_TRANSPORT=journal
_UID=0
_GID=0
_COMM=jlog
_EXE=/usr/bin/jlog
_CAP_EFFECTIVE=3fffffffff
_SYSTEMD_CGROUP=/system.slice/system-dropbear.slice/dropbear@1-192.168.13.253:22-192.168.13.226:55910.service
_SYSTEMD_UNIT=dropbear@1-192.168.13.253:22-192.168.13.226:55910.service
_SYSTEMD_SLICE=system-dropbear.slice
_MACHINE_ID=9c18637da35643bd83793ff2b5a5a965
_HOSTNAME=HOST2
MESSAGE=src 2
_CMDLINE=jlog MESSAGE=src 2
_PID=31736
_SOURCE_REALTIME_TIMESTAMP=1435158421209032

" | /lib/systemd/systemd-journal-remote -o ~/logtest/src2.journal -

Does anybody have similar issues?

hwinkel commented 9 years ago

which systemd version you use?

sschindler89 commented 9 years ago

I am running systemd 220.

sschindler89 commented 9 years ago

I also was able to reproduce this with systemd 221

sschindler89 commented 9 years ago

I expect this bug in journalctl to have its roots in sd-journal.h. My reasoning: I use a tool, which reads from several journalfiles via sd_journal_open_directory similar to journalctl and later on writes the read journal entries into a new file. My tool only writes entries from one file similar to the incomplete view in journalctl.

hwinkel commented 9 years ago

any hints for hunting the root cause so far? seems the journal directory or file handling/caching cluold be involved ?

sschindler89 commented 9 years ago

I have another hint for this issue: I've rewritten my tool, so the timestamps of the incoming remote logs are monotonic over all files . journalctl now shows all entries, so this issue seems to be rooted in the monotony of the timestamps of all journalentries.

baby-gnu commented 6 years ago

Hello.

I have the same issue with --file=GLOB.

I have configured journal-remote.conf like this:

[Remote]
 Seal=false
 SplitMode=host
 ServerKeyFile=/etc/ssl/journal/nebula-manager.eole.lan.key
 ServerCertificateFile=/etc/ssl/journal/nebula-manager.eole.lan.pem
 TrustedCertificateFile=/etc/ssl/journal/ca.pem

The journals are created

root@nebula-manager:~# ls -l /var/log/journal/remote/
total 65564
-rw-r----- 1 systemd-journal-remote systemd-journal-remote 8388608 mars  16 14:52 remote-192.168.231.114.journal
-rw-r----- 1 systemd-journal-remote systemd-journal-remote 8388608 mars  16 14:52 remote-192.168.231.115.journal
-rw-r----- 1 systemd-journal-remote systemd-journal-remote 8388608 mars  16 14:52 remote-192.168.231.116.journal
-rw-r----- 1 systemd-journal-remote systemd-journal-remote 8388608 mars  16 14:36 remote-192.168.231.117.journal
-rw-r----- 1 systemd-journal-remote systemd-journal-remote 8388608 mars  16 14:36 remote-192.168.231.119.journal
-rw-r----- 1 systemd-journal-remote systemd-journal-remote 8388608 mars  16 14:45 remote-192.168.231.120.journal
-rw-r----- 1 systemd-journal-remote systemd-journal-remote 8388608 mars  16 14:48 remote-192.168.231.121.journal
-rw-r----- 1 systemd-journal-remote systemd-journal-remote 8388608 mars  16 11:44 remote-192.168.231.122.journal

But following only works for the first file remote-192.168.231.114.journal in the list:

root@nebula-manager:~# journalctl -f --file '/var/log/journal/remote/remote-*.journal'
-- Logs begin at mer. 2018-03-14 12:40:11 CET. --
mars 16 14:47:17 nebula114 sshd[2059]: Disconnected from 192.168.231.67 port 60648
mars 16 14:47:17 nebula114 sshd[2059]: pam_unix(sshd:session): session closed for user root
[...]

It's working when not specifying a glob:

root@nebula-manager:~# journalctl -f --file /var/log/journal/remote/remote-192.168.231.116.journal
-- Logs begin at ven. 2018-03-16 11:00:10 CET. --
mars 16 14:56:31 nebula116 kernel: Ebtables v2.0 registered
mars 16 14:56:33 nebula116 libvirtd[1152]: libvirt version: 1.3.1, package: 1ubuntu10.19 (Marc Deslauriers <marc.deslauriers@ubuntu.com> Fri, 16 Feb 2018 07:51:15 -0500)
mars 16 14:56:33 nebula116 libvirtd[1152]: hostname: nebula116
[...]

I tried with the --merge which does not work either, it only show local logs:

root@nebula-manager:~# journalctl -f --merge
-- Logs begin at mer. 2018-03-14 11:06:45 CET. --
mars 16 14:56:25 nebula-manager nbd_server[24785]: Can't open authorization file /etc/ltsp/nbd-server.allow (No such file or directory).
mars 16 14:56:25 nebula-manager nbd_server[24785]: Starting to serve
mars 16 14:56:25 nebula-manager nbd_server[24785]: Size of exported file/device is 345174016
[...]

Maybe everything is related?

Regards.

baby-gnu commented 6 years ago

I just found #3618 for the --merge bug. My Xenial systemd is version 229 which looks to be to old.

Regards.

daviessm commented 5 years ago

This still seems to happen in systemd-239. It appears journalctl can only follow one journal.

anderspitman commented 2 years ago

I'm still seeing this behavior on systemd 250.

cmomberg commented 1 year ago

The problem still persists with systemd version 252 on RHEL/AlmaLimux 9: using --file=GLOB or --dir combined with -f only shows the entries from one of the files

Combining -f and --merge even gives an (unexpected) error message:

[root@hostname ~]# journalctl -f --merge
Using --boot or --list-boots with --merge is not supported.