systemd / systemd

The systemd System and Service Manager
https://systemd.io
GNU General Public License v2.0
13.33k stars 3.81k forks source link

systemd-resolved.service ignores information from both DHCP server and nfsroot kernel parameter when booting with iPXE #24095

Closed MischaBaars closed 2 years ago

MischaBaars commented 2 years ago

Hello,

When I boot the server (Fedora Server) that contains the nfsroot, a network connection is opened by default with ipv4.method set to auto. All DHCP settings are retrieved from a TP-Link router.

When I boot the server (Fedora Workstation) with dnsmasq enabled and with DHCP disabled on the router, and I boot one of the nodes either with iPXE kernel line 1) or line 2), a network connection is opened by default with ipv4.method set to manual. It does retrieve the hostname and the ip address, but resolvectl status does not return a DNS server. If I put that in manually using resolvectl dns eth0 192.168.2.1, then everything does works as it should.

iPXE kernel line 1): kernel vmlinuz-5.18.10 console=tty1 root=/dev/nfs rw net.ifnames=0 nfsroot=192.168.2.7:/mnt/dev/e37a097c-cdde-4e70-a2ef-5c2e98b773c7 ip=on rootfstype=ext4 raid=noautodetect rootwait quiet iPXE kernel line 2): kernel vmlinuz-5.18.10 console=tty1 root=/dev/nfs rw net.ifnames=0 nfsroot=192.168.2.7:/mnt/dev/e37a097c-cdde-4e70-a2ef-5c2e98b773c7 ip={ip}:{dhcp-server}:{gateway}:{netmask}:{hostname}:eth0:off:{dns} rootfstype=ext4 raid=noautodetect rootwait quiet

journalctl output on server booting Fedora Server:

Jul 23 14:14:05 fedora systemd[1]: Starting systemd-resolved.service - Network Name Resolution... Jul 23 14:14:06 fedora systemd-resolved[374]: Positive Trust Anchors: Jul 23 14:14:06 fedora systemd-resolved[374]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d Jul 23 14:14:06 fedora systemd-resolved[374]: Negative trust anchors: home.arpa 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa 168.192.in-addr.arpa d.f.ip6.arpa corp home internal intranet lan local private test Jul 23 14:14:06 fedora systemd-resolved[374]: Using system hostname 'fedora'. Jul 23 14:14:06 fedora systemd[1]: Started systemd-resolved.service - Network Name Resolution. Jul 23 14:14:06 fedora audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-resolved comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jul 23 14:14:08 fedora NetworkManager[462]: [1658578448.0497] dns-mgr[0x55ed03469250]: init: dns=systemd-resolved rc-manager=symlink, plugin=systemd-resolved Jul 23 14:14:13 fedora systemd-resolved[374]: eth0: Bus client set default route setting: yes Jul 23 14:14:13 fedora systemd-resolved[374]: eth0: Bus client set DNS server list to: 192.168.2.1

journalctl output on nodes booting Fedora Server:

Jul 23 14:22:48 tp11 systemd[1]: Starting systemd-resolved.service - Network Name Resolution... Jul 23 14:22:49 tp11 systemd-resolved[362]: Positive Trust Anchors: Jul 23 14:22:49 tp11 systemd-resolved[362]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d Jul 23 14:22:49 tp11 systemd-resolved[362]: Negative trust anchors: home.arpa 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa 168.192.in-addr.arpa d.f.ip6.arpa corp home internal intranet lan local private test Jul 23 14:22:50 tp11 systemd-resolved[362]: Using system hostname 'tp11'. Jul 23 14:22:50 tp11 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-resolved comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jul 23 14:22:50 tp11 systemd[1]: Started systemd-resolved.service - Network Name Resolution. Jul 23 14:23:00 tp11 NetworkManager[451]: [1658578980.0304] dns-mgr[0x562590d1f250]: init: dns=systemd-resolved rc-manager=symlink, plugin=systemd-resolved

Help would be appreciated!

Thanks, Mischa Baars.

P.S. The networkctl executable seems to be missing on Fedora Server 36.

yuwata commented 2 years ago

systemd-resolved does not directly handle DHCP messages. But, systemd-networkd, NetworkManager, or other DHCP clients handle them, and (not sure about others, but at least systemd-networkd) transfer DNS settings to systemd-resolved.

If a system is booted with ip= kernel command line or friends, then the initial DHCP message is processed in the initrd. I do not know which dracut modules enabled, but IIRC, the DNS settings obtained in initrd will not propagated from initrd to the systemd-resolved started after switching root. IOW, DNS settings obtained in initrd is completely ignored after switching root.

Hence, you need to (re)start a DHCP client (systemd-networkd, NetworkManager, dhclient, or so) after switching root to (re)obtain DNS settings through DHCP protocol.

From the above logs, you use NetworkManager. I guess NetworkManager in your system does not manage the interface that received DHCP message on boot. Please try to configure NetworkManager to retrieve DHCP message from the server. If it does not work, please contact to a NetworkManager community.

This is not a bug, and all necessary mechanisms are already implemented. Closing.

MischaBaars commented 2 years ago

If a system is booted with ip= kernel command line or friends, then the initial DHCP message is processed in the initrd. I do not know which dracut modules enabled, but IIRC, the DNS settings obtained in initrd will not propagated from initrd to the systemd-resolved started after switching root. IOW, DNS settings obtained in initrd is completely ignored after switching root.

Snippet from linux-5.18.10/Documentation/admin-guide/nfs/nfsroot.rst:

Note that the kernel will not synchronise the system time with any NTP servers it discovers; this is the responsibility of a user space process (e.g. an initrd/initramfs script that passes the IP addresses listed in /proc/net/ipconfig/ntp_servers to an NTP client before mounting the real root filesystem if it is on NFS).

I compiled the kernel with CONFIG_BLK_DEV_INITRD disabled. Do you think this might be the problem?

yuwata commented 2 years ago

Not familiar with the kernel compile option.

Note, if you use the default Fedora kernel and initrd generated by dracut, then IIRC the initial DHCP message exchange is done by dhclient or NetworkManager in the initrd, instead of the kernel's internal DHCP client.

If you use a custom kernel or build a custom initrd, then I have no idea unless you provide more details.

MischaBaars commented 2 years ago

Well, I've just finished compiling a new kernel with CONFIG_BLK_DEV_INITRD enabled, and booted it.

Now it complains about not being able to mount the root file system:

systemd[1]: Starting initrd-switch-root.service - Switch Root... systemctl[445]: Failed to switch to root: Specified switch root path /sysroot does not seem to be an OS tree. os-release file is missing. systemd[1]: Failed to start initrd-switch-root.service - Switch Root...

This snippet from linux-5.18.10/Documentation/admin-guide/nfs/nfsroot.rst might be related:

rdinit= To specify which file contains the program that starts system initialization, administrators can use this command line parameter. The default value of this parameter is "/init". If the specified file exists and the kernel can execute it, root filesystem related kernel command line parameters, including 'nfsroot=', are ignored.

I've not used this exact kernel parameter in the iPXE kernel line, but I did use the iPXE initrd line.

If this kernel parameter indeed causes the root-switch to fail, that would mean that (for now?) booting diskless network nodes is impossible with CONFIG_BLK_DEV_INITRD enabled. This was the reason I disabled it.

Don't you think this is kind of strange, with all these supercomputer around!? How do they boot 4000 processor cores, without installing the operating system a 4000 times over?

yuwata commented 2 years ago

Sorry, but I cannot follow. Please provide more details. E.g. full kernel command line options, iPXE settings, how to build initrd images (e.g. the list of enabled dracut modules if you use dracut), fstab (if you use), and so on. Otherwise, there is almost nothing we can do.

MischaBaars commented 2 years ago

Hi Yu,

Is there anything specific detail you need? I can try booting with nfsrootdebug and debug tomorrow. The kernel boot line has otherwise not changed from the line in the first mail.

Dnsmasq is working a 100% sure, because the Fedora Workstation installation USB is booting with DHCP enabled on the server and DHCP disabled on the router. iPXE and tfpd are working a 100% sure, this I can see from the iPXE environment variables (or by typing 'config net0' on the iPXE boot prompt) and the server 'journalctl --boot --follow).

Fstab you won't need, because it can only be read after mounting the root filesystem.

I'm not an expert in initrds, but it is a standard initrd generated by the kernel compilation. I statically compiled in everything that showed up as being loaded as a module during normal boot (second operating system on the dual boot server), thus there shouldn't be anything of much importance in the initrd, except for things like NTP initialization, as nfsroot.rst mentions.

Best regards, Mischa.

On Mon, 25 Jul 2022, 12:17 Yu Watanabe, @.***> wrote:

Sorry, but I cannot follow. Please provide more details. E.g. full kernel command line options, iPXE settings, how to build initrd images (e.g. the list of enabled dracut modules if you use dracut), fstab (if you use), and so on. Otherwise, there is almost nothing we can do.

— Reply to this email directly, view it on GitHub https://github.com/systemd/systemd/issues/24095#issuecomment-1193855598, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZZJMAGWC3NRWFACGNJE3CTVVZSVHANCNFSM54OKDE2Q . You are receiving this because you authored the thread.Message ID: @.***>

MischaBaars commented 2 years ago

Hi Yu,

Here's the entire rdsosreport.txt, booted with the nfsrootdebug and debug kernel parameters.

Let me know if you need to know more.

Hope you guys can help! My only other option is to install Fedora Server on 16 different machines. I had hoped to reduce number that to 1.

Best regards, Mischa Baars.

On Mon, 25 Jul 2022, 12:17 Yu Watanabe, @.***> wrote:

Sorry, but I cannot follow. Please provide more details. E.g. full kernel command line options, iPXE settings, how to build initrd images (e.g. the list of enabled dracut modules if you use dracut), fstab (if you use), and so on. Otherwise, there is almost nothing we can do.

— Reply to this email directly, view it on GitHub https://github.com/systemd/systemd/issues/24095#issuecomment-1193855598, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZZJMAGWC3NRWFACGNJE3CTVVZSVHANCNFSM54OKDE2Q . You are receiving this because you authored the thread.Message ID: @.***>

ID_FS_UUID=561E-FC00 ID_FS_UUID_ENC=561E-FC00 ID_FS_BLOCK_SIZE=512 ID_FS_TYPE=vfat ID_FS_PARTLABEL=EFI System Partition ID_FS_PARTUUID=930b9893-a89c-fa40-b239-9b0784ea150c

ID_FS_UUID=136bf76d-e564-4e0d-84ed-e67e14f6d0a7 ID_FS_UUID_ENC=136bf76d-e564-4e0d-84ed-e67e14f6d0a7 ID_FS_TYPE=crypto_LUKS ID_FS_PARTUUID=82bfcb8a-a12c-b844-9516-4e0cb336ade0

ID_FS_BLOCK_SIZE=512 ID_FS_UUID=1CEC318CEC316166 ID_FS_UUID_ENC=1CEC318CEC316166 ID_FS_TYPE=ntfs ID_FS_PARTUUID=4859068e-b252-f148-b2b3-c5f7cb7b9565

/dev/disk/by-partlabel: total 0 lrwxrwxrwx 1 root root 15 Jul 26 08:30 EFI\x20System\x20Partition -> ../../mmcblk1p1

/dev/disk/by-partuuid: total 0 lrwxrwxrwx 1 root root 15 Jul 26 08:30 4859068e-b252-f148-b2b3-c5f7cb7b9565 -> ../../mmcblk1p2 lrwxrwxrwx 1 root root 15 Jul 26 08:30 82bfcb8a-a12c-b844-9516-4e0cb336ade0 -> ../../mmcblk1p4 lrwxrwxrwx 1 root root 15 Jul 26 08:30 930b9893-a89c-fa40-b239-9b0784ea150c -> ../../mmcblk1p1 lrwxrwxrwx 1 root root 15 Jul 26 08:30 de5a0dd4-e286-fb4d-b092-e1d8047388fc -> ../../mmcblk1p3

/dev/disk/by-uuid: total 0 lrwxrwxrwx 1 root root 15 Jul 26 08:30 136bf76d-e564-4e0d-84ed-e67e14f6d0a7 -> ../../mmcblk1p4 lrwxrwxrwx 1 root root 15 Jul 26 08:30 1CEC318CEC316166 -> ../../mmcblk1p2 lrwxrwxrwx 1 root root 15 Jul 26 08:30 561E-FC00 -> ../../mmcblk1p1 lrwxrwxrwx 1 root root 15 Jul 26 08:30 f6200601-d21e-4442-9240-1430d515b43a -> ../../mmcblk1p3

yuwata commented 2 years ago

Sorry, but the provided info does not contain anything needed for debugging the situation. Especially, the log does not contain anything about DHCP message handling.

First of all, the "issue" is in initrd? or after switching root??

MischaBaars commented 2 years ago

Hi Yu,

The issue is indeed in initd. Without initd everything boots just fine. With initrd, the system refuses to mount the NFS root-partition, exactly the same as here:

https://discussion.fedoraproject.org/t/cannot-boot-from-initramfs-unable-to-switch-root-sysroot-directory-is-empty/33043

I now solved the problem like this. I boot the system without initrd AND I disable NetworkManager. With Network manager the system does not wake-up from suspend, without NetworkManager, the system does. You just have to set a number of things by hand, that is /etc/resolv.conf and wake-on-lan. Like this it's running very smoothly!

Thank you for your help! Hope these problems are fixed in upcoming releases of Fedora, but at least it CAN be made to work with a little effort.

Best regards, Mischa Baars.