home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.91k stars 971 forks source link

Upgrade to 12.x cause kernel panics (generic-x86 Intel Nuc) #3254

Closed bboehmke closed 6 months ago

bboehmke commented 7 months ago

Describe the issue you are experiencing

After upgrading to 12.1 (same with 12.0) the system crash with a kernel panic. Sometimes this only happens if the Home Assistant Core is updated afterwards but it also appears directly.

I also tried to upgrade with all addons disabled with the same result.

What operating system image do you use?

generic-x86-64 (Generic UEFI capable x86-64 systems)

What version of Home Assistant Operating System is installed?

12.1

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. upgrade OS to 12.1 (or 12.0)
  2. wait for reboot
  3. (Optional) Update Home Assistant Core if crash did not happen automatically

Anything in the Supervisor logs that might be useful for us?

not able to access after the crash

Anything in the Host logs that might be useful for us?

not able to access after the crash (see below for screenshots of kernel panic)

System information

System Information

version core-2024.2.5
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.1
os_name Linux
os_version 6.1.74-haos
arch x86_64
timezone Europe/Berlin
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.34.0 Stage | running Available Repositories | 1401 Downloaded Repositories | 10
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 11.5 -- | -- update_channel | stable supervisor_version | supervisor-2024.03.0 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 457.7 GB disk_used | 28.6 GB healthy | true supported | true board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | Studio Code Server (5.15.0), Terminal & SSH (9.9.0), Mosquitto broker (6.4.0), SMA MQTT Bridge (0.2.2), ESPHome (2024.2.2), Node-RED (17.0.7), AdGuard Home (5.0.4), Simple WebDAV server (TEST) (0.0.2), evcc (0.124.9), Whisper (1.0.2), Piper (1.5.0), openWakeWord (1.10.0)
Dashboards dashboards | 2 -- | -- resources | 5 views | 10 mode | storage
Recorder oldest_recorder_run | 14. März 2024 um 17:39 -- | -- current_recorder_run | 14. März 2024 um 22:01 estimated_db_size | 199.32 MiB database_engine | sqlite database_version | 3.44.2

Additional information

This installation is running on an Intel NUC (NUC8i5BEK). The only additional hardware connected is a SONOFF Zigbee USB Dongle.

Here some "screenshots of the kernel panic: panic_01667 panic_01744 panic_01768 panic_01884

owlen commented 7 months ago

I encountered a similar problem. The only differences I notice from the above are that I use a different generic x86 (minisforum mini PC) and the SkyConnect USB adapter for Zigbee. I reinstalled 12.0 and successfully restored a backup.

System Information

version core-2024.3.1
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.2
os_name Linux
os_version 6.6.16-haos
arch x86_64
timezone Asia/Nicosia
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4779 Installed Version | 1.34.0 Stage | running Available Repositories | 1399 Downloaded Repositories | 12
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 12.0 -- | -- update_channel | stable supervisor_version | supervisor-2024.03.0 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 56.6 GB disk_used | 12.2 GB healthy | true supported | true board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | Samba share (12.3.1), Advanced SSH & Web Terminal (17.2.0), Cloudflared (5.1.6), File editor (5.8.0), Frigate (0.13.2), Glances (0.21.0), Home Assistant Google Drive Backup (0.112.1), Let's Encrypt (5.0.15), Mosquitto broker (6.4.0), Music Assistant BETA (2.0.0b108), OpenSpeedTest (v2.0.3), Piper (1.5.0), Plex Media Server (3.5.0), Studio Code Server (5.15.0), TasmoAdmin (0.29.1), Whisper (2.0.0), Zigbee2MQTT (1.36.0-1)
Dashboards dashboards | 3 -- | -- resources | 3 views | 9 mode | storage
Recorder oldest_recorder_run | 8 March 2024 at 02:12 -- | -- current_recorder_run | 17 March 2024 at 22:53 estimated_db_size | 58.77 MiB database_engine | sqlite database_version | 3.44.2
bboehmke commented 6 months ago

Similar issue with OS 12.2. This time the OS starts without panic but after an update of HA Core a kernel panic occurs again. vlcsnap-2024-04-11-19h01m16s445

agners commented 6 months ago

Hm, the stack traces look quite different this time around :thinking: This kind of random hardware crashes point more towards a hardware issue TBH. Maybe a defective memory module or something? :thinking:

bboehmke commented 6 months ago

I will try to run a memory test but I can not imagine that it is an hardware issue.

This setup is running completely stable (currently with OS version 11.5) since over a year. I don't had any issues with this setup and so far every update until 12.x was also working fine.

I thought that it is maybe related to some driver issues and tried to disable everything that I don't use in the bios (Wifi, Bluetooth, ...) but sadly without success. Is there anything I can do to to narrow down the issue?

bboehmke commented 6 months ago

@agners you are right. It seems there is really an issue with the memory of my system. Strange that I only have issues with the new OS version 12.x.

I will try to replace the memory modules and test if the issue is resolved.

bboehmke commented 6 months ago

After replacing the memory everything works as expected.

Sorry for raising the issue, this seems to be clearly a hardware issue.