Closed floppym closed 1 year ago
Is systemd-networkd
activated in the host system?
What is the output of systemctl status systemd-networkd
?
Is
systemd-networkd
activated in the host system?
I'm not sure what you mean by that.
What is the output of
systemctl status systemd-networkd
?
○ systemd-networkd.service - Network Configuration
Loaded: loaded (/usr/lib/systemd/system/systemd-networkd.service; enabled; preset: enabled)
Active: inactive (dead) since Sun 2023-05-21 12:15:09 EDT; 32s ago
Duration: 1.661s
TriggeredBy: ● systemd-networkd.socket
Docs: man:systemd-networkd.service(8)
Process: 459 ExecStart=/usr/lib/systemd/systemd-networkd (code=exited, status=0/SUCCESS)
Main PID: 459 (code=exited, status=0/SUCCESS)
Status: "Shutting down..."
CPU: 28ms
May 21 12:15:08 gentoo systemd[1]: Starting systemd-networkd.service...
May 21 12:15:08 gentoo systemd-networkd[459]: lo: Link UP
May 21 12:15:08 gentoo systemd-networkd[459]: lo: Gained carrier
May 21 12:15:08 gentoo systemd-networkd[459]: Enumeration completed
May 21 12:15:08 gentoo systemd[1]: Started systemd-networkd.service.
It's also worth noting that this doesn't happen on every boot. There seems to be some kind of race condition.
If systemd-networkd.service happens to get stopped before systemd is re-execed, everything works as it should.
I'm not sure what you mean by that.
I mean if it's enabled outside of the initrd.
Loaded: loaded (/usr/lib/systemd/system/systemd-networkd.service; enabled; preset: enabled) Active: inactive (dead) since Sun 2023-05-21 12:15:09 EDT; 32s ago
That is weird, it's clearly enabled yet it's not activated. Are there any services that failed (systemctl --failed
)?
It's also worth noting that this doesn't happen on every boot. There seems to be some kind of race condition.
If systemd-networkd.service happens to get stopped before systemd is re-execed, everything works as it should.
I wonder if this is an issue with dracut's configuration of systemd-networkd. Seeing how only the local interface is enumerated in the initrd, I assume you don't need networking in the initrd. Try disabling dracut's networking module if possible.
That is weird, it's clearly enabled yet it's not activated. Are there any services that failed (systemctl --failed)?
No failed services.
Try disabling dracut's networking module if possible.
I have no need for networking support in the initrd. It is simply enabled by default.
Disabling the network module in dracut avoids the problem, since systemd-networkd.service never gets activated in the initrd.
Well I'd say do that for now. If you ask me, this is most likely a problem on dracut's side of things, but the actual devs may be able to comment on that better.
This is a bug report intended for the systemd developers. I'm not seeking assistance in working around the issue.
Most of the systemd-related initrd logic is shipped with systemd. My best guess is this is a systemd issue, not a dracut issue.
I dug into the initrd a bit, and I can't find anywhere that systemd-networkd.service is being activated explicitly.
I think it is actually being activated by a netlink message via systemd-networkd.socket.
I think you are looking for: https://github.com/dracutdevs/dracut/blob/master/modules.d/01systemd-networkd/module-setup.sh#L66
I think you are looking for: https://github.com/dracutdevs/dracut/blob/master/modules.d/01systemd-networkd/module-setup.sh#L66
Nope. That sets up a symlink at /etc/systemd/system/multi-user.target.wants/systemd-networkd.service
. However, multi-user.target
never gets activated in an initrd. The default unit is initrd.target
, which does not actually pull in systemd-networkd.service
.
That sets up a symlink at /etc/systemd/system/multi-user.target.wants/systemd-networkd.service.
Er... you are correct, I misremembered thinking it has WantedBy=sysinit.target
. Could there be something that pulls in network.target
or network-online.target
?
Now I wonder if enabling it the way they do isn't erroneous to begin with.
systemd-networkd.service isn't pulled in by either network.target or network-online.target.
Anyway, I ran a boot with log_level=debug, and it seems to confirm that systemd-networkd.service is being socket-actived as I suspected. That's likely a mistake in the dracut module setup. However, I don't think it directly relates to the problem I originally reported in this issue.
Can you check your journal log for Stopped target Socket Units.
before switch root? That should be taking systemd-networkd
down with it.
EDIT:
systemd-networkd.service isn't pulled in by either network.target or network-online.target.
Not directly, but enabling systemd-networkd
will also enable systemd-networkd-wait-online
, which does get pulled in by network-online.target
.
Here's a full journal from a boot that exhibits the issue. systemd-networkd.socket gets deactivated before systemd is re-execed, but systemd-networkd.service does not. The re-exec occurs on line 1163.
https://gist.github.com/floppym/951950c0aff2b95146b2c964b87372c6
Here's a boot with systemd.log_level=debug.
https://gist.github.com/floppym/a6b9d95105e743340cb019f43511d656
I'm seeing the same behavior with v253.3.
Thanks. From the log,
May 21 20:05:44 gentoo systemd[1]: systemd-networkd.socket: Incoming traffic
May 21 20:05:45 gentoo systemd[1]: dbus-broker.service: Thawing unit.
May 21 20:05:45 gentoo systemd-networkd[446]: Bus bus-api-network: changing state RUNNING → CLOSING
May 21 20:05:45 gentoo dbus-broker[506]: Dispatched 248 messages @ 1(±2)μs / message.
May 21 20:05:45 gentoo systemd-networkd[446]: Bus bus-api-network: changing state CLOSING → CLOSED
May 21 20:05:46 gentoo systemd-networkd[446]: Terminate operation initiated.
(switching root)
May 21 20:05:46 naomi systemd[1]: systemd-networkd.service: Got notification message from PID 446 (STOPPING=1, STATUS=Shutting down...)
May 21 20:05:46 naomi systemd[1]: systemd-networkd.service: Changed running -> stop-sigterm
BTW, why networkd is included in initrd? I'd like to support that, but at least currently, networkd does not work well in initrd, IIRC.
It's included by default with dracut. What are the issues with networkd in the initrd? I cannot reproduce his issue on my Xen VM, networkd is restarted as normal. However in my initrd, it is wanted by sysinit.target
.
If networkd.service is wanted by sysinit.target, then initrd-cleanup.service stops networkd.service before switching root, IIRC.
What are the issues with networkd in the initrd?
networkd.service removes some (not all) IP addresses, routes, and friends on stop. Hence, restarting networkd during switching root may disconnect network. So, if the rootfs is backed by a network filesystem, system may fail to boot.
networkd.service removes some (not all) IP addresses, routes, and friends on stop. Hence, restarting networkd during switching root may disconnect network. So, if the rootfs is backed by a network filesystem, system may fail to boot.
This is solved by
[Network]
KeepConfiguration=yes
If networkd.service is wanted by sysinit.target, then initrd-cleanup.service stops networkd.service before switching root, IIRC.
I presumed so. This is why I think dracut is in the wrong for simply doing systemctl enable systemd-networkd
in the initrd. They should probably do an override setting WantedBy=sysinit.target
or manually enable the service(s).
networkd.service removes some (not all) IP addresses, routes, and friends on stop. Hence, restarting networkd during switching root may disconnect network. So, if the rootfs is backed by a network filesystem, system may fail to boot.
This is solved by
[Network] KeepConfiguration=yes
Yes. That's exactly for such setup. However, currently the setting is not perfect. e.g. DHCPv6 address is not kept, IIRC.
If networkd.service is wanted by sysinit.target, then initrd-cleanup.service stops networkd.service before switching root, IIRC.
I presumed so. This is why I think dracut is in the wrong for simply doing
systemctl enable systemd-networkd
in the initrd. They should probably do an override settingWantedBy=sysinit.target
or manually enable the service(s).
Yeah, at least that should be a good workaround.
Yes. That's exactly for such setup. However, currently the setting is not perfect. e.g. DHCPv6 address is not kept, IIRC.
Hmmm, I see. I did not run into that issue, as my network with the NFS server is mainly IPv4, but this is good to keep in mind. I did run in an issue where systemd-networkd
assigns an extra IP address, but for my specific case that would be fixed by #27706. Not sure if there's a setting that stops networkd
from generating an extra ipv4 from the same subnet with real hardware. Though I suppose most setups will use static assignment on the DHCP server.
BTW, why networkd is included in initrd?
dracut automagically includes it if certain system packages are installed.
dracut pulls in systemd-networkd if the "network" module is enabled, and neither connman nor NetworkManager are installed.
https://github.com/dracutdevs/dracut/blob/059/modules.d/40network/module-setup.sh#L19
The "network" module is enabled/required if any network filesystem utilities are installed (eg. nfs-utils, cifs-utils).
https://github.com/dracutdevs/dracut/blob/059/modules.d/95nfs/module-setup.sh#L21
This is extremely easy to reproduce in mkosi btw, it'll happen 90% of the time. Just run systemctl status systemd-networkd
after boot and it'll show you that it's enabled yet it'll have exited.
So we send SIGTERM to systemd-networkd during switch-root, but don't actually wait for it to shutdown, which means that if networkd takes a little time to stop, we'll have already deserialized and processed the unit graph in the rootfs, but won't have queued a job for networkd in the rootfs because we think its still running, but then it shuts down and by the time we get to starting systemd-networkd-wait-online, it isn't running anymore and the dependency fails. At least that's what I think is happening here.
it seem that #27791 doesn't work for me. i have the same problem on my gentoo systems (5.14.112, 253.3).
@livelace Please provide debugging logs.
it seem that #27791 doesn't work for me. i have the same problem on my gentoo systems (5.14.112, 253.3).
I think I hit this problem as well. The behavior here suggests that initrd's systemd (pid1) thinks systemd-networkd's service has been stopped already:
While it's not, because the same pid still outputs something after switch-root:
The service is then left in inactive state after switch-root:
I'll try to get some debug logs later.
systemd version the issue has been seen with
253.4
Used distribution
Gentoo
Linux kernel version used
6.1.29
CPU architectures issue was seen on
x86_64
Component
systemd-networkd
Expected behaviour you didn't see
systemd-networkd.service should be started after systemd is re-execed after switching root from the initrd (dracut-059).
Unexpected behaviour you saw
systemd-networkd.service is de-activated after initrd-switch-root.service starts and systemd is re-execed. It is not restarted automatically. This results in a system with no network connectivity after a reboot, until the service is started manually.
Steps to reproduce the problem
Also see https://github.com/systemd/systemd/issues/27683, https://github.com/systemd/systemd/pull/27617, https://github.com/systemd/systemd/pull/27649.
Additional program output to the terminal or log subsystem illustrating the issue