home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
5.01k stars 982 forks source link

Ethernet network device name changed after updating to OS 9.1 #2174

Closed rccoleman closed 1 year ago

rccoleman commented 2 years ago

Describe the issue you are experiencing

I just updated to HA OS 9.1, which I run in a KVM running on Debian 11, and the network failed to come up. From the console, it looks like the ethernet device changed, for some reason, from enp0s3 to ens3, and there was no longer an active network interface. I added another interface using nmcli con edit, assigned the same static IP and gateway that I had before, and it's back up and running. Now my connections look like this:

image

I rebooted and it came up fine, but I wonder what might have happened there. I've gone through other HA OS updates and rebooted the host machine many times without incident

The only changes since 9.0 that look interesting are these: https://github.com/home-assistant/operating-system/pull/2138 https://github.com/home-assistant/operating-system/pull/2171

I don't know if they're related, but I suspect that it would be hard for most folks to recover from.

What operating system image do you use?

ova (for Virtual Machines)

What version of Home Assistant Operating System is installed?

9.1

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

I was running HA OS 9.0 in a KVM on Debian 11 and it was working fine. As soon as I updated to HA OS 9.1 based on an update alert, the KVM started without a functional network interface.

Anything in the Supervisor logs that might be useful for us?

There were lots of errors in the Supervisor log when it didn't have a functional network interface, but I doubt they'll be of interest.

Anything in the Host logs that might be useful for us?

Nope, ask if needed.

System Health information

System Information

version core-2022.10.1
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.10.5
os_name Linux
os_version 5.15.72
arch x86_64
timezone America/Los_Angeles
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4723 Installed Version | 1.27.2 Stage | running Available Repositories | 1127 Downloaded Repositories | 33
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | February 15, 2023 at 4:00 PM relayer_connected | true remote_enabled | true remote_connected | true alexa_enabled | true google_enabled | false remote_server | us-west-2-1.ui.nabu.casa can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 9.1 -- | -- update_channel | beta supervisor_version | supervisor-2022.10.0 agent_version | 1.4.1 docker_version | 20.10.17 disk_total | 30.8 GB disk_used | 5.4 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | Samba share (10.0.0), Glances (0.16.0), File editor (5.4.1), Remote Backup (v0.4.9), Log Viewer (dev), SSH & Web Terminal (12.0.2), Portainer (2.0.0), ESPHome (2022.9.4)
Universal Devices ISY994 host_reachable | ok -- | -- device_connected | true last_heartbeat | October 6, 2022 at 10:50 PM websocket_status | connected
keymaster zwave_integration | zwave_js -- | -- network_status | on
Dashboards dashboards | 2 -- | -- resources | 14 views | 5 mode | yaml
Recorder oldest_recorder_run | October 2, 2022 at 3:16 AM -- | -- current_recorder_run | October 6, 2022 at 10:23 PM estimated_db_size | 598.20 MiB database_engine | mysql database_version | 10.3.32

Additional information

This is my network interface in the KVM definition:

    <interface type='bridge'>
      <mac address='52:54:00:ce:e6:c8'/>
      <source bridge='br0'/>
      <model type='e1000'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
Calimerorulez commented 2 years ago

Same here, on my Proxmox host. I haven't found a way to solve it though.

Calimerorulez commented 2 years ago

I had to re-enable the interface and switch to 'auto' on the HA cli.

Calimerorulez commented 2 years ago
image

No network adapter found...

agners commented 2 years ago

@rccoleman I'd recomend to use a virtio type of virtual network. That said, the issue still should not happen.

I tried to reproduce the issue, but was not succesful. I started with HAOS 9.0, set the network interface to e1000e type, and go the interface named enp1s0. After upgrading the interface was still named enp1s0 and correctly setup. That said, Virtual Manager added also the following to settings in my case:

  <target dev="vnet2"/>
  <alias name="net0"/>

However, even after removing the name stayed enp1s0 for me.

@Calimerorulez what type of network adapter has your Proxmox installation configured?

agners commented 2 years ago

Actually, the target and alias are somehow only there at runtime. The static VM configuration does not have them.

I wonder why the new network interface is named ens3.

Reading the Predictable Network Interface Names documentation it seems that is a network interface detected through a hot-pluggable PCIe slot:

...

  1. Names incorporating Firmware/BIOS provided PCI Express hotplug slot index numbers (example: ens1)
  2. Names incorporating physical/geographical location of the connector of the hardware (example: enp2s0) ...

So it seems that you correctly identified https://github.com/home-assistant/operating-system/pull/2138/files as culprit, since it enables CONFIG_HOTPLUG_PCI_PCIE=y.

I wonder how we should go from here. It is odd that the system's network does not come up in this situation. At the very least it should get a DHCP address. If we want a smooth transition, we probably need a custom update mechanism here :cry:

Maybe reverting the commit and create a 9.2 would be the best way to go about it for now.

jens-maus commented 2 years ago

Interesting spot. In fact, personally, I never really liked that switch to "predicted network interface names" commonly used throughout all major Linux distros these days due to systemd having decided to go that road. While the name states "predicted", in fact I never really felt that these new namings instead of the old-style ethX names are somewhat predictable or make any sense. In fact, for some servers it is "ensX", then "enpXXX" or even something completely different.

Given that confusion and this example here is exactly the reason why I decided to use net.ifnames=0 on the kernel command line for https://github.com/jens-maus/RaspberryMatic so that interface names are IMHO more human readable and partly more predictable to humans than the initial approach of the systemd developers. Not suggesting it, but perhaps doing the same for HomeAssistantOS would be IMHO a good approach.

elcajon commented 2 years ago

@Calimerorulez what type of network adapter has your Proxmox installation configured?

I use a virtio on my Proxmox (7.2-11) and experienced also the same issue after upgrading.

image

Calimerorulez commented 2 years ago

@agners @elcajon

Same here:

image

In HAOS my interface is named 'ens18'. I can't check if it was renamed by upgrading to 9.1. I used a fixed IP and gateway before upgrading, and after upgrading the interface in HAOS was disabled. I had to enable it and set to auto to get the network up again if that is of any relevance to the issue.

elcajon commented 2 years ago

In HAOS my interface is named 'ens18'. I can't check if it was renamed by upgrading to 9.1. I used a fixed IP and gateway before upgrading, and after upgrading the interface in HAOS was disabled. I had to enable it and set to auto to get the network up again if that is of any relevance to the issue.

I can confirm that it was renamed from enp0s18 to ens18. I just had to change /etc/NetworkManager/system-connections/Supervisor enp0s18.nmconnection the line containing interface-name to match the renamed interface.

agners commented 2 years ago

@rccoleman (or someone else with QEMU/KVM) can you share the full domain xml? In my case it seemed not het detected as hot-plug network interface.

@ioctl2 thoughts?

tteck commented 2 years ago

@agners, does Proxmox need to be set to use "Hotplug" for CPU and/or Memory? https://pve.proxmox.com/wiki/Hotplug_(qemu_disk,nic,cpu,memory)

Edit: FWIW, I had no issues with 9.1

rccoleman commented 2 years ago

@agners Here's my complete domain XML. hassio.xml.gz

I'll switch from e1000 to virtio in the meantime.

Edit: The network device was still ens3 after switching to virtio, but was otherwise working. I just installed 9.2 and it's back to enp0s3 and is still working.

nagyrobi commented 2 years ago

I had no issues either with 9.1, also on Proxmox (7.2-11).

tteck commented 2 years ago

I'm wondering if it's a case of old (2m) vs new (4m) Disk for storing EFI vars (EFI disk) for Proxmox. For instance, BTRFS file system requires the newer EFI Disk. That would explain why some work and others didn't.

nagyrobi commented 2 years ago

I have in my Proxmox server

Device       Start       End   Sectors   Size Type
/dev/sda1       34      2047      2014  1007K BIOS boot
/dev/sda2     2048   1050623   1048576   512M EFI System
/dev/sda3  1050624 468862094 467811471 223.1G Linux LVM

However, the HA machine was originally created under ESXi (I think somewhere around HAOS 7.x), and migrated to Proxmox with ovftool and qm importovf while it was around HAOS 8.x. After migration all I had to do was to add a new E1000 network interface (copied over the MAC address too), change the BIOS type to OVMF (UEFI) and change the disk controller to VirtIO SCSI, and it worked. The VM itself doesn't have an EFI disk: image HAOS 9.1 didn't break it.

tteck commented 2 years ago

Same, but with EFI Disk.

Screenshot 2022-10-07 11 02 44 AM

The EFI disk contains the EFIVARS which also contains the boot order if no efidisk is specified, there is a temporary one given to the vm on each start.

nagyrobi commented 2 years ago

Can't tell what's the size of that one.

agners commented 2 years ago

Edit: The network device was still ens3 after switching to virtio, but was otherwise working. I just installed 9.2 and it's back to enp0s3 and is still working.

Ok thanks for confirming. That makes https://github.com/home-assistant/operating-system/pull/2138 definitely the culprit.

I leave the PR https://github.com/home-assistant/operating-system/pull/2138 in on the dev channel, as we might want to work on a solution which migrates to the new device. But meanwhile, for OS 9, we don't want to risk more users installation, so I've reverted it in 9.2.

agners commented 2 years ago

@jens-maus it leads to predictable names if the system does not change... However, with changing system environment, it is forces a new name.. This makes it feel less predictable if only a single network card is in play: It will be eth0 no matter how the device is connected/or in which slot its plugged in. I think this fact, and the fact that the names can get rather complicated, makes it feel unpredictable.

The predictable names really get useful in systems with multiple network cards. I use it on my Arch system successfully, and I never had a problem that a device name would change. I always get rely on my two built-in Ethernet interface to be the same name, since I first setup that machine about 3 years ago.

We also have some users with multiple network cards (e.g. for a separated IoT network). The kernel does enumeration of devices increasingly in parallel, so things are enumerated rather randomly. I am quite sure that we would see issues reported that eth0/eth1 suddenly switched after a kernel upgrade or similar effects if nothing is tying down interface names.

I remember in the old days on Ubuntu as well as Debian, scripts tied down interface names using udev rules. With VM it was a bit problematic when you accidentally changed the MAC, suddenly Ubuntu/Debian "lost" the network configuration for that interface. It was messy as well..

I think this is en exceptional case, where we suddenly enable a driver which leads the system to detect the network card differently.

ioctl2 commented 2 years ago

@rccoleman (or someone else with QEMU/KVM) can you share the full domain xml? In my case it seemed not het detected as hot-plug network interface.

@ioctl2 thoughts?

This is not a scenario I tested for that PR, as I was doing fresh deployments each time. I agree that the patch should be reverted for the release and kept in dev as we look for an elegant solution.

ioctl2 commented 2 years ago

Given that confusion and this example here is exactly the reason why I decided to use net.ifnames=0 on the kernel command line for https://github.com/jens-maus/RaspberryMatic so that interface names are IMHO more human readable and partly more predictable to humans than the initial approach of the systemd developers. Not suggesting it, but perhaps doing the same for HomeAssistantOS would be IMHO a good approach.

@jens-maus could you confirm that this issue did not occur in your build after commit a0c18fca6d254e7dcf660d2ace211e2487b01861 with net.ifnames=0 added to the kernel command line?

nagyrobi commented 2 years ago

+1 for getting back to the "good old" ethX naming scheme. I always hated the so-called "predictable" thing, which made it vendor-specific...

Calimerorulez commented 2 years ago

+1 for getting back to the "good old" ethX naming scheme. I always hated the so-called "predictable" thing, which made it vendor-specific...

Good old ethX isn't the case for Proxmox installs. It's back to enp0s18 for me and that works fine :)

agners commented 2 years ago

Switching to kernel based Ethernet naming (pure ethX) would cause quite bit of havoc today. It seems that NetworkManager does not automatically use a new Ethernet interface, unless no profile exist at all. That is why HAOS starts off with DHCP enabled on the Ethernet interface on first boot. But from that point on NetworkManager does not touch a new interface, even if that original interface disappears. It also does not migrate configurations.

So essentially, switching from predictable names to ethX would left almost all installation stranded without network :scream:

It would need a migration of some sort. But then the question how would that work exactly? If we assume a single network interface, we could migrate en* to eth0. But as soon as two Ethernet interface is in play, it gets complicated :cold_sweat:

github-actions[bot] commented 1 year ago

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

ioctl2 commented 1 year ago

I think this issue should remain open.