Open keszybz opened 1 year ago
I think the problem in your case is that systemd-journald-audit.socket
was left enabled in initrd but disabled in the host. Not really sure if this was intended (maybe the initrd was not rebuilt after re-installing the package with journal audit socket change ?) but that had the side effect to let the socket running during the switch root transition and to stop it once PID1 is reexecuted in the host.
Indeed once PID1 was reexecuted in the host, it noticed that the socket was no more needed (since you forgot to call systemctl preset
in your package scriptlet) and attempted to stop it, which seems the correct thing to do. However I'm not sure why systemd is spamming the logs like it did when it realized that the socket was still receiving data while being stopped. Maybe we should log such event only once.
Well, yes. But it should be OK to have a socket enabled in the initrd and disabled in the host. This certainly should not end with an infinite loop and the socket never being closed.
Sure, I was trying to describe what was happening in your case. But I didn't mean that the infinite loop was something expected.
I quickly tried to reproduce the problem but couldn't. The results are a bit different depending on whether the debug logs are enabled or not but I couldn't trigger the infinite loop in both case.
Unfortunately I won't be able to look at this in details before next week.
So we have a logic to disable a listening socket when it has a stop job scheduled to prevent incoming data from triggering unnecessary events. This is done by flushing all incoming connections and draining all data accumulated in the socket buffers.
However the "flushing all incoming connections" logic doesn't apply to netlink sockets, see: https://github.com/systemd/systemd/blob/v253-rc2/src/basic/socket-util.c#L1117
And due to the fact that the socket buffer is also drained, there's always room in the socket buffer to accept more incoming data, which is probably the reason why you're seeing the infinite loop.
At that point, I'm not really sure whether it's worth improving the logic because it only flushes the accumulated incoming connections and data at a given time but it doesn't prevent new ones from triggering new events. So in theory an application that keeps opening the listening socket could trigger the same infinite loop, I think.
Maybe instead we should give more priority to the event source dealing with the run queue than the socket io events has and gives PID1 a chance to proceed with the stop job.
We discussed this during the video meeting. @poettering's idea: set ratelimit on the socket source.
yes I guess that would work too.
@fbuihuu any update on this?
Not really sorry. And I have no free time to spend on this one currently.
Ok, will move to the next milestone then
https://github.com/systemd/systemd/pull/25687 removed static enablement of systemd-journald-audit.socket. It is now managed by normal enable/disable symlinks and the presets logic.
This also missed that journald uses Sockets=audit.socket so the audit socket is always implicitly enabled anyway
systemd version the issue has been seen with
253-rc1
Used distribution
Fedora rawhide
Linux kernel version used
No response
CPU architectures issue was seen on
None
Component
systemd
Expected behaviour you didn't see
https://github.com/systemd/systemd/pull/25687 removed static enablement of
systemd-journald-audit.socket
. It is now managed by normal enable/disable symlinks and the presets logic. When building the package in Fedora, I forgot to add a scriptlet that'd callsystemctl preset
and reeanable the socket. That is fixed now, and this bug is about what happens when the socket is (intentionally or not) disabled.From https://bugzilla.redhat.com/attachment.cgi?id=1940497:
Unexpected behaviour you saw
No response
Steps to reproduce the problem
No response
Additional program output to the terminal or log subsystem illustrating the issue
No response