systemd / systemd

The systemd System and Service Manager
https://systemd.io
GNU General Public License v2.0
13.24k stars 3.79k forks source link

journald: log meatdata is acquired in a racy fashion #28499

Open bjourne opened 1 year ago

bjourne commented 1 year ago

systemd version the issue has been seen with

253.7-1

Used distribution

Arch Linux

Linux kernel version used

6.4.1-arch2-1

CPU architectures issue was seen on

None

Component

journalctl

Expected behaviour you didn't see

If I start chromium from Konsole in KDE, log messages that should be credited to "chromium" are credited to "plasmashell":

jul 23 16:08:46 bjourne plasmashell[469688]: [469688:469688:0723/160846.653689:ERROR:gl_display.cc(790)] eglInitialize OpenGLES failed with error EGL_NOT_INITIALIZED
jul 23 16:08:46 bjourne plasmashell[469688]: [469688:469688:0723/160846.653717:ERROR:gl_display.cc(824)] Initialization of all EGL display types failed.
jul 23 16:08:46 bjourne plasmashell[469688]: [469688:469688:0723/160846.653744:ERROR:gl_ozone_egl.cc(26)] GLDisplayEGL::Initialize failed.

I guess chromium doesn't set the "sender" field when it generates log messages and systemd then picks some default? If so, systemd should pick the name of the binary and not the name of some (in this case) innocent parent process.

Unexpected behaviour you saw

No response

Steps to reproduce the problem

No response

Additional program output to the terminal or log subsystem illustrating the issue

No response

poettering commented 1 year ago

how are these messages generated? stdout/stderr?

please provide the journalctl -o verbose version of these log messages, which should tell us more.

does plasmashell fork off chromium?

note that the retrieving the comm field of a process is async, and hence generally racy. we cache it, hence it might be that the comm field might be from before or after the actual log event.

typically you want to use structure logging in the apps in question (instead of stderr-based logging), since it can carry information that is fully accurate, and we don't have to augment racily so much.

bjourne commented 1 year ago

I can't answer some of those questions, but here is an example of a syslog message generated by Firefox being credited to plasmashell:

Mon 2023-08-28 08:31:29.825739 CEST [s=c564da89886e4cbda88721980362f531;i=44a034;b=14987ae5926f45d5bd2447577d6fd8fc;m=c5af58870;t=603f5d83a2bcb;x=575dc5843c>
    PRIORITY=6
    SYSLOG_FACILITY=3
    _UID=1001
    _GID=1002
    _AUDIT_SESSION=3
    _AUDIT_LOGINUID=1001
    _SYSTEMD_OWNER_UID=1001
    _SYSTEMD_UNIT=user@1001.service
    _SYSTEMD_SLICE=user-1001.slice
    _BOOT_ID=14987ae5926f45d5bd2447577d6fd8fc
    _MACHINE_ID=b6fe121d8ee84f6092edaa8a88c3ad42
    _HOSTNAME=bjourne-laptop
    _RUNTIME_SCOPE=system
    _CAP_EFFECTIVE=0
    SYSLOG_IDENTIFIER=plasmashell
    _SYSTEMD_USER_SLICE=app.slice
    _TRANSPORT=stdout
    _STREAM_ID=cd4fcf60b61d4f4288fec8c5b88b3600
    MESSAGE=ATTENTION: default value of option mesa_glthread overridden by environment.
    _PID=35820
    _COMM=firefox
    _EXE=/usr/lib/firefox/firefox
    _CMDLINE=/usr/lib/firefox/firefox
    _SYSTEMD_CGROUP=/user.slice/user-1001.slice/user@1001.service/app.slice/app-firefox-41015fe2155c47fa93ed618439e7eff5.scope
    _SYSTEMD_USER_UNIT=app-firefox-41015fe2155c47fa93ed618439e7eff5.scope
    _SYSTEMD_INVOCATION_ID=f4601255937347049af41f9b1731fdad

The ATTENTION message is printed on stdout when Firefox is started from a console.

bjourne commented 1 year ago

Here is the same thing happening when you start mplayer from Dolphin:

Sun 2023-10-01 17:51:08.487424 CEST [s=b30a7a5be41f44df980e36de453c3f32;i=10eaa3;b=f3cd592e1d604d8a8ce514460dbd36f1;m=4f352c6a00;t=606a9a046a31f;x=3a741c68feaf7c87]
    SYSLOG_FACILITY=3
    _TRANSPORT=journal
    _UID=1000
    _GID=1000
    _AUDIT_SESSION=3
    _AUDIT_LOGINUID=1000
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/init.scope
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_UNIT=user@1000.service
    _SYSTEMD_USER_UNIT=init.scope
    _SYSTEMD_SLICE=user-1000.slice
    _SYSTEMD_USER_SLICE=-.slice
    _BOOT_ID=f3cd592e1d604d8a8ce514460dbd36f1
    _MACHINE_ID=50bc24d6af434132926fe589c3f0a08c
    _HOSTNAME=bjourne
    _RUNTIME_SCOPE=system
    PRIORITY=6
    TID=704
    SYSLOG_IDENTIFIER=systemd
    _PID=704
    _COMM=systemd
    _EXE=/usr/lib/systemd/systemd
    _CMDLINE=/usr/lib/systemd/systemd --user
    _CAP_EFFECTIVE=800000000
    CODE_FILE=src/core/job.c
    CODE_LINE=796
    CODE_FUNC=job_emit_done_message
    JOB_TYPE=start
    JOB_RESULT=done
    MESSAGE_ID=39f53479d3a045ac8e11786248231fbf
    MESSAGE=Started MPlayer Media Player - Multimedia player.
    JOB_ID=976
    USER_INVOCATION_ID=b66ada0b7ad147388097a5e362f867f9
    USER_UNIT=app-mplayer-3789bf1f346448eebf132b91b88fbc44.scope
    _SOURCE_REALTIME_TIMESTAMP=1696175468487424
Sun 2023-10-01 17:51:08.571993 CEST [s=b30a7a5be41f44df980e36de453c3f32;i=10eaa4;b=f3cd592e1d604d8a8ce514460dbd36f1;m=4f352db43a;t=606a9a047ed59;x=b915efbeb62ef65a]
    SYSLOG_FACILITY=3
    _UID=1000
    _GID=1000
    _CAP_EFFECTIVE=0
    _AUDIT_SESSION=3
    _AUDIT_LOGINUID=1000
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_UNIT=user@1000.service
    _SYSTEMD_SLICE=user-1000.slice
    _BOOT_ID=f3cd592e1d604d8a8ce514460dbd36f1
    _MACHINE_ID=50bc24d6af434132926fe589c3f0a08c
    _HOSTNAME=bjourne
    _RUNTIME_SCOPE=system
    PRIORITY=6
    SYSLOG_IDENTIFIER=plasmashell
    _SYSTEMD_USER_SLICE=app.slice
    _TRANSPORT=stdout
    _STREAM_ID=a24fc78ade5f4882b8a128ddfc964e2d
    MESSAGE=MPlayer SVN-r38412 (C) 2000-2023 MPlayer Team
    _COMM=mplayer
    _EXE=/usr/bin/mplayer
    _CMDLINE=/usr/bin/mplayer "/home/bjourne/d/videos/dct/JPEG DCT, Discrete Cosine Transform (JPEG Pt2)- Computerphile-Q2aEzeMDHMA.mkv"
    _PID=119001
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/app.slice/app-mplayer-3789bf1f346448eebf132b91b88fbc44.scope
    _SYSTEMD_USER_UNIT=app-mplayer-3789bf1f346448eebf132b91b88fbc44.scope
    _SYSTEMD_INVOCATION_ID=b66ada0b7ad147388097a5e362f867f9
Sun 2023-10-01 17:51:08.573633 CEST [s=b30a7a5be41f44df980e36de453c3f32;i=10eaa5;b=f3cd592e1d604d8a8ce514460dbd36f1;m=4f352dbaa1;t=606a9a047f3c1;x=1f0908954913818d]
    SYSLOG_FACILITY=3
    _UID=1000
    _GID=1000
    _CAP_EFFECTIVE=0
    _AUDIT_SESSION=3
    _AUDIT_LOGINUID=1000
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_UNIT=user@1000.service
    _SYSTEMD_SLICE=user-1000.slice
    _BOOT_ID=f3cd592e1d604d8a8ce514460dbd36f1
    _MACHINE_ID=50bc24d6af434132926fe589c3f0a08c
    _HOSTNAME=bjourne
    _RUNTIME_SCOPE=system
    PRIORITY=6
    SYSLOG_IDENTIFIER=plasmashell
    _SYSTEMD_USER_SLICE=app.slice
    _TRANSPORT=stdout
    _STREAM_ID=a24fc78ade5f4882b8a128ddfc964e2d
    _COMM=mplayer
    _EXE=/usr/bin/mplayer
    _CMDLINE=/usr/bin/mplayer "/home/bjourne/d/videos/dct/JPEG DCT, Discrete Cosine Transform (JPEG Pt2)- Computerphile-Q2aEzeMDHMA.mkv"
    MESSAGE=226 audio & 469 video codecs
    _PID=119001
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/app.slice/app-mplayer-3789bf1f346448eebf132b91b88fbc44.scope
    _SYSTEMD_USER_UNIT=app-mplayer-3789bf1f346448eebf132b91b88fbc44.scope
    _SYSTEMD_INVOCATION_ID=b66ada0b7ad147388097a5e362f867f9
...

I'm not sure how the start mechanism works but it appears to go through systemd? If so systemd could override SYSLOG_IDENTIFIER and set it to the name of the binary which is mplayer in this case.

poettering commented 1 week ago

so this happens on logs written via stdout only. They have no syslog identifier field set, so we make one up. We make one up by reading the peer's "comm" field (i.e. the process title, via /proc/$PID/comm). This is necessarily racy. First because we will receive the client's PID in-line with the log message but by the time we read the comm, it might already have changed. Moreover, we apply caching, since getting this info for each log message is expensive.

What probably happens is that plasmashell logs something to journald in the child process it forks off for the target program (which fills its metadata cache), it then doesn an execve() in it, and the target program logs too, which causes the cached (now incorrect) data to be used.

This is not fixable without kernel changes.

bjourne commented 1 week ago

I should mention that something has changed in the year since I reported this bug and messages from mplayer are now correctly credited to mplayer:

[s=fd20fcec0a1a4ae587da0f6958283cfc;i=2627d6;b=833d2f72bd5b4fd19e5bede0d68611f3;m=2aeb1e451f;t=624be9dc6a5f6;x=cfada1c9d1f0e6ce]
    _UID=1000
    _GID=1000
    _CAP_EFFECTIVE=0
    _AUDIT_LOGINUID=1000
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_UNIT=user@1000.service
    _SYSTEMD_SLICE=user-1000.slice
    _MACHINE_ID=50bc24d6af434132926fe589c3f0a08c
    _HOSTNAME=bjourne
    _RUNTIME_SCOPE=system
    _SYSTEMD_USER_SLICE=app.slice
    PRIORITY=6
    SYSLOG_FACILITY=3
    _TRANSPORT=stdout
    _AUDIT_SESSION=6
    _BOOT_ID=833d2f72bd5b4fd19e5bede0d68611f3
    SYSLOG_IDENTIFIER=mplayer
    _COMM=mplayer
    _EXE=/usr/bin/mplayer
    _CMDLINE=/usr/bin/mplayer "/home/bjourne/d/videos/dct/JPEG DCT, Discrete Cosine Transform (JPEG Pt2)- Computerphile-Q2aEzeMDHMA.mkv"
    MESSAGE=VIDEO:  [VP90]  1920x1080  0bpp  25.000 fps    0.0 kbps ( 0.0 kbyte/s)
    _STREAM_ID=54eac69a2d394776a3f04fa83fb03c59
    _PID=146257
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/app.slice/app-mplayer@00cb53c0f37b40868c1f3ab1cc874756.service
    _SYSTEMD_USER_UNIT=app-mplayer@00cb53c0f37b40868c1f3ab1cc874756.service
    _SYSTEMD_INVOCATION_ID=6bca104ad4b54a72a2b245b7804912c1

So maybe systemd's SYSLOG_IDENTIFIER heuristics have improved or Plasmashell does something different.