kairos-io / kairos

The immutable Linux meta-distribution for edge Kubernetes.
https://kairos.io
Apache License 2.0
1.15k stars 97 forks source link

systemd-networkd-wait-online fails with multiple ethernet where one or more is disconnected #2898

Closed bencorrado closed 1 month ago

bencorrado commented 1 month ago

Kairos version: PRETTY_NAME="Ubuntu 24.04.1 LTS" NAME="Ubuntu" VERSION_ID="24.04" VERSION="24.04.1 LTS (Noble Numbat)" VERSION_CODENAME=noble ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=noble LOGO=ubuntu-logo KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues" KAIROS_GITHUB_REPO="kairos-io/kairos" KAIROS_SOFTWARE_VERSION_PREFIX="k3s" KAIROS_VERSION="v3.1.3-1-g2daaf78-dirty" KAIROS_FLAVOR="ubuntu" KAIROS_TARGETARCH="amd64" KAIROS_PRETTY_NAME="kairos-standard-ubuntu-24.04 v3.1.3-1-g2daaf78-dirty" KAIROS_FLAVOR_RELEASE="24.04" KAIROS_ID="kairos" KAIROS_ID_LIKE="kairos-standard-ubuntu-24.04" KAIROS_VERSION_ID="v3.1.3-1-g2daaf78-dirty" KAIROS_REGISTRY_AND_ORG="quay.io/kairos" KAIROS_ARTIFACT="kairos-ubuntu-24.04-standard-amd64-generic-v3.1.3-1-g2daaf78-dirty" KAIROS_VARIANT="standard" KAIROS_RELEASE="v3.1.3-1-g2daaf78-dirty" KAIROS_FAMILY="ubuntu" KAIROS_MODEL="generic" KAIROS_HOME_URL="https://github.com/kairos-io/kairos" KAIROS_NAME="kairos-standard-ubuntu-24.04" KAIROS_IMAGE_REPO="quay.io/kairos/ubuntu:24.04-standard-amd64-generic-v3.1.3-1-g2daaf78-dirty" KAIROS_IMAGE_LABEL="24.04-standard-amd64-generic-v3.1.3-1-g2daaf78-dirty"

CPU architecture, OS, and Version: Linux localhost 6.8.0-45-generic #45-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 30 12:02:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug In a system with more than one ethernet port, if not all the ethernet ports have a connection, Karios waits for all connections even if one interface is successfully connected.

If system fails to get internet on all Ethernet interfaces that are auto assigned DHCP that system does not complete boot successfully. This is beacuse systemd-networkd-wait-online to spin and wait for all interfaces to come up.

The interactive installer (and other services waiting on systemd-networkd-wait-online) fail to launch, as the state never goes to online.

This is happening because of the * wildcard that is in /etc/systemd/network/20-dhcp.network

[Match]
Name=en*
[Network]
DHCP=yes
[DHCP]
ClientIdentifier=mac

The way systemd-networkd-wait-online works by default is to wait for all these interfaces to come online. I think we should only require one interface to be online, not all of them to allow the system to proceed to boot normally.

To Reproduce Attempt to boot a Karios installer image with a system using systemd-networkd on a machine with more than one network interface, where at least one of those interfaces does not have a DHCP server and is not otherwise configured with network config from the cloud-init file.

Expected behavior If the Kairos system is online with at least one network interface, it should proceed to boot normally. It should only wait on systemd-networkd-wait-online if there are no online interfaces.

Resolution

I was able to add the following to the end of my Dockerfile to patch systemd-networkd-wait-online. This override tells systemd-networkd-wait-online it can use any online interface and does not need to wait for all of them.

# Create override for systemd-networkd-wait-online to use any online interface, not waiting for all of them
RUN mkdir -p /etc/systemd/system/systemd-networkd-wait-online.service.d/ \
  && echo -e "[Service]\nExecStart=\nExecStart=/usr/lib/systemd/systemd-networkd-wait-online --any --ipv4" \
  > /etc/systemd/system/systemd-networkd-wait-online.service.d/override.conf

Ultimately, this should probably be added as an overlay in packages

Itxaka commented 1 month ago

This is a good one indeed. Feels like a bit wrong on the systemd side no? Like systemd-networkd-wait-online should success once at least 1 nic is online, not wait for all of them....feels like its on the systemd side or we are not understanding it correctly and maybe it needs a different config to once it get one up, then it should just continue....

Itxaka commented 1 month ago

ah yes, now I see your PR and does indeed that :D

Itxaka commented 1 month ago

ahh interesting, it will wait for all ifaces to either fail or succeed.

The service systemd-networkd-wait-online.service invokes systemd-networkd-wait-online without any options. Thus, it waits for all managed interfaces to be configured or failed, and for at least one to be online.
Itxaka commented 1 month ago

testing this on a vm with 2 nics, one connected and one not, resulted into the service timing out after 2 minutes.

Itxaka commented 1 month ago

open until framework lands on kairos

Itxaka commented 1 month ago

on master

clyra commented 1 month ago

Hi,

I also stumbled on this, but got it working by adding this to the user-data:

- path: /etc/systemd/system/systemd-networkd-wait-online.service.d/override.conf
          permissions: 0644
          content: |
            [Service]
            ExecStart=
            ExecStart=/usr/lib/systemd/systemd-networkd-wait-online --any

I didnt bother to report because it seemed to be a ubuntu issue, not kairos! Is this the case or the 20-dhcp.network is indeed added by kairos?