home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
5.02k stars 982 forks source link

HA loses network when router changed #3567

Open hawkeye100 opened 2 months ago

hawkeye100 commented 2 months ago

Describe the issue you are experiencing

Changed router on LAN (so DHCP changed). HA (on x86 PC) acquired an IP address. Appeared to reboot OK - BUT could not be seen on network by other devices, nor reconnect to external data sources e.g. for backup or media. Showed CIFS error but no network error.

Swapped cables, switch etc - no change. Found this previous report from 2022 and solution which eventually worked, but looks like it has not been addressed so bringing it to attention again (as that thread is now closed)

What operating system image do you use?

generic-x86-64 (Generic UEFI capable x86-64 systems)

What version of Home Assistant Operating System is installed?

6.6.20

Did the problem occur after upgrading the Operating System?

No

Hardware details

No USB devices.8GB RAM. 300GB hard disc. CPU celeron

Steps to reproduce the issue

  1. remove router (and DHCP)
  2. replace router and DHCP
  3. ...NOTE have not repeated as there is too much work involved in sorting the IP addresses for all other devices - but it is confirmed in the link to previous issue in the description

Anything in the Supervisor logs that might be useful for us?

-

Anything in the Host logs that might be useful for us?

-

System information

System Information

version core-2024.8.1
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.4
os_name Linux
os_version 6.6.20-haos
arch x86_64
timezone Europe/London
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4861 Installed Version | 1.34.0 Stage | running Available Repositories | 1383 Downloaded Repositories | 26
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 12.1 -- | -- update_channel | stable supervisor_version | supervisor-2024.08.0 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 292.7 GB disk_used | 10.3 GB healthy | true supported | true host_connectivity | true supervisor_connectivity | true ntp_synchronized | true virtualization | board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | File editor (5.8.0), Samba share (12.3.2), Mosquitto broker (6.4.1), Advanced SSH & Web Terminal (18.0.0), SQLite Web (4.2.0), Log Viewer (0.17.0), Syncthing (1.19.0)
Dashboards dashboards | 6 -- | -- resources | 19 views | 19 mode | storage
Recorder oldest_recorder_run | 22 August 2024 at 16:01 -- | -- current_recorder_run | 28 August 2024 at 10:30 estimated_db_size | 652.61 MiB database_engine | sqlite database_version | 3.45.3

Additional information

Workaround/solution is: ha network update eno1 --ipv4-method auto from HA server CLI

Desired solution is to avoid this happening if network/DHCP becomes unavailable, though a minimum might be a warning on reboot (similar to the CIFS warnings) to at least point in the right direction. Thanks

Anto79-ops commented 2 months ago

I wanted to mention something similar happened to me with my NUC 12 running HAOS 13.1. There was a bios FW update for my NUC (oringinal dated for sometime in in 2022, and the new FW dated June 2024). I shutdown the system and moved my NUC to a computer screen via HDMI so that I can update the bios via USB, it updated sucessfully, and moved it back to its permanent (headless) spot. When I rebooted it, HA did not come back, nor could I connect to it. When I connected a monitor to it again and logged in via CLI/HDMI, everything was working, core started up but it was complaining there was no netwrok access.

Long story shory, I did a backup and restore and seems the network adapter changed from enspXX something to enspXX something else, not sure if this was because of a FW update or some issue in HAOS.

agners commented 2 weeks ago

Swapped cables, switch etc - no change.

Just to be clear, you did also try to restart your Home Assistant instance correct?

The issue you linked has been resolved with https://github.com/home-assistant/supervisor/pull/3676, so this really should not happen anylonger. Maybe your change somehow still made it happen again :thinking:

agners commented 2 weeks ago

I wanted to mention something similar happened to me with my NUC 12 running HAOS 13.1. There was a bios FW update for my NUC (oringinal dated for sometime in in 2022, and the new FW dated June 2024). I shutdown the system and moved my NUC to a computer screen via HDMI so that I can update the bios via USB, it updated sucessfully, and moved it back to its permanent (headless) spot. When I rebooted it, HA did not come back, nor could I connect to it. When I connected a monitor to it again and logged in via CLI/HDMI, everything was working, core started up but it was complaining there was no netwrok access.

The network settings are strictly bound to the network interface. On very initial start, we setup the first network interface (as determined by NetworkManager) as primary interface and with DHCP. From then onwards, we assume that this interface is always "there" (meaning, at the same udev path, so essentially just there, at the same place if it was PCIe). So this means if anything changes in the way the NIC is detected by the OS (e.g. I can imagine that a BIOS update changes the order of enumeration of PCI devices/buses), the system will consider this as a new interface and initially not touch it.

What Supervisor probably should do in the specific case where the previous primary interface disappears and a new primary interface appears, just simply migrate the configuration. But this isn't implemented yet at this point.

That said, I think OP did not change anything on the HA side, so I don't think that this is what happened here.

hawkeye100 commented 2 weeks ago

Swapped cables, switch etc - no change.

Just to be clear, you did also try to restart your Home Assistant instance correct?

The issue you linked has been resolved with home-assistant/supervisor#3676, so this really should not happen anylonger. Maybe your change somehow still made it happen again 🤔

Thanks for comment. It is now a while ago but as far as I recall ....

Initially no HA reboot, then a reboot to see if it fixed it - but it did not. Hence frustration at an obscure issue given everything else 'appeared' to be ok. The only clue was the CIFS error, but the network share resource was confirmed as available suggesting a network issue on the HA box. Superficial investigation suggested the network interface was OK (IP address etc).

No change on HA itself - correct.

supervisor#3676 looks like it should have fixed it (and seems to be a similar description of the issue), but may not have done perhaps given my experience was more recent ?

Your suggestion of migrating the settings might work though I do not profess to understand the more intricate workings of HA and Linux interfaces.

My request is to try and identify the issue at a higher level (at HA prompt) or even fix/bypass it automatically so we do not have to SSH to the HA machine and read/change underlying Linux configs - especially if we have no idea where to look. Progress elsewhere in HA has been significant in bringing more technical things into the HA level of interface or even the GUI though the latter would not work for this presumably.