home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.84k stars 967 forks source link

HAOS does not recover/reconnect after network failure, error "rcu_preempt self-detected stall on CPU" #3392

Closed KHV8 closed 4 months ago

KHV8 commented 4 months ago

Describe the issue you are experiencing

If the ethernet goes offline for a short while, like a switch upgrade or other reasons, then HAOS does not recover the network. The console on the machine (a Laptop, Lenovo Thinkpad) shows the following 20240525_204245965_iOS

The PC is connected thorugh wired Gigabit Ethernet with a static IP address 192.168.0.9.

Im quite sure this have work earlier, however not something checked very often. So not entirely sure.

I noticed due to a power failure in the house.

What operating system image do you use?

generic-x86-64 (Generic UEFI capable x86-64 systems)

What version of Home Assistant Operating System is installed?

6.6-29

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. Have HA running
  2. Turn of the network switch
  3. Wait 20 sec
  4. Turn on the switch
  5. Check if HA is online, it is not
  6. Go to console, which will show above errors.
  7. Check "network info", it will not show an interface with 192.168.0.9
  8. Reload network on console by command "network reload"
  9. No change

Anything in the Supervisor logs that might be useful for us?

Nope

Anything in the Host logs that might be useful for us?

nothing I can see is related

System information

System Information

version core-2024.5.5
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.2
os_name Linux
os_version 6.6.29-haos
arch x86_64
timezone Europe/Copenhagen
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.34.0 Stage | running Available Repositories | 1400 Downloaded Repositories | 47 HACS Data | ok
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | October 19, 2024 at 02:00 relayer_connected | true relayer_region | eu-central-1 remote_enabled | true remote_connected | true alexa_enabled | false google_enabled | true remote_server | eu-central-1-6.ui.nabu.casa certificate_status | ready instance_id | 556248443b9a46bf9dc117a076c565e8 can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 12.3 -- | -- update_channel | stable supervisor_version | supervisor-2024.05.1 agent_version | 1.6.0 docker_version | 25.0.5 disk_total | 234.0 GB disk_used | 183.1 GB healthy | true supported | true board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | Studio Code Server (5.15.0), Samba share (12.3.1), Mosquitto broker (6.4.0), Terminal & SSH (9.14.0), AirSonos (4.2.1), SQLite Web (4.1.2), Samba Backup (5.2.0), Glances (0.21.1), Zigbee2MQTT (1.37.1-1), Frigate (Full Access) (0.13.2), Firefox (1.2.0), MQTT Explorer (browser-1.0.1), Advanced SSH & Web Terminal (18.0.0), Zigbee2MQTT (1.37.1-1)
Dashboards dashboards | 16 -- | -- resources | 35 views | 84 mode | storage
Recorder oldest_recorder_run | April 24, 2024 at 01:01 -- | -- current_recorder_run | May 25, 2024 at 22:52 estimated_db_size | 2019.35 MiB database_engine | sqlite database_version | 3.44.2
Sonoff version | 3.7.3 (e240aaf) -- | -- cloud_online | 1 / 2 local_online | 0 / 0

Additional information

No response

KHV8 commented 4 months ago

After reading issue https://github.com/home-assistant/operating-system/issues/3368 I think it might be the same, however not sure as they do not mention the console errors.

sairon commented 4 months ago

Yeah, it is very likely the same issue, similar error is in the logs attached to one of the posts there. And I guess your Lenovo laptop is also using the buggy driver (Intel Gigabit e1000e). I'm closing this issue then, re-open it if you find any clues it's something different.