home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.76k stars 956 forks source link

Can't upgrade to HAOS 12.3 on NUC Intel 5 #3363

Closed OZ1SEJ closed 5 days ago

OZ1SEJ commented 3 months ago

Describe the issue you are experiencing

When I try to install the upgrade from v. 12.2 to 12.3, it attempts to boot from slot A three times, after which it reverts to slot B.

What operating system image do you use?

generic-x86-64 (Generic UEFI capable x86-64 systems)

What version of Home Assistant Operating System is installed?

12.2

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. Settings
  2. Home Assistant Operating System 12.3
  3. Install

Anything in the Supervisor logs that might be useful for us?

No.

Anything in the Host logs that might be useful for us?

No.

System information

System Information

version core-2024.5.3
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.2
os_name Linux
os_version 6.6.25-haos
arch x86_64
timezone Europe/Copenhagen
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.34.0 Stage | running Available Repositories | 1395 Downloaded Repositories | 9
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 12.2 -- | -- update_channel | stable supervisor_version | supervisor-2024.05.1 agent_version | 1.6.0 docker_version | 25.0.5 disk_total | 116.7 GB disk_used | 26.4 GB healthy | true supported | true board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | Terminal & SSH (9.14.0), CEC Scanner (3.0), Mosquitto broker (6.4.0), Zigbee2MQTT (1.37.1-1), Node-RED (17.0.12), Duck DNS (1.17.0), InfluxDB (5.0.0), Grafana (9.2.2), ZeroTier One (0.18.0), File editor (5.8.0), rtl_433 MQTT Auto Discovery (0.8.1)
Dashboards dashboards | 3 -- | -- resources | 5 views | 21 mode | storage
Recorder oldest_recorder_run | May 2, 2024 at 8:36 AM -- | -- current_recorder_run | May 11, 2024 at 5:13 PM estimated_db_size | 3354.56 MiB database_engine | sqlite database_version | 3.44.2

Additional information

When it's trying to boot on slot A, this text is displayed on screen just before it reboots:

[  OK  ] Started containerd container runtime.
[    4.731515] BUG: scheduling while atomic: kworker/1:2/77/0x00000002
[ ***  ] A start job is running for Network Manager Wait Online (24s / no limit)
[   25.733573] rcu: INFO: rcu_preempt self-detected stall on CPU
[   25.734009] rcu: o1-....: (20999 ticks this GP) idle=86d4/1/0x4000000000000000 softirq=6601/6601 fqs=5250
[***   ] A start job is running for Network Manager Wait Online (52s / no limit)
[   54.313600] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-.... } 21265 jiffies s: 277 root: 0x2/.
[FAILED] Failed to start Network Manager Wait Online.
See 'systemctl status NetworkManager-wait-online.service' for details.
[  OK  ] Reached target Network is Online.
sairon commented 3 months ago

That looks very much like some upstream kernel regression. What Intel NUC model is that exactly? Can you get boot log from the failed boot, ideally with some kernel stack traces after booting back to the previous version using ha host logs -b-1 -n1000 (or replace -1 with lower value)?

OZ1SEJ commented 3 months ago

It's a NUC5 i5 RYK Core i5-5250u @ 2.7 Ghz with 4 GB RAM and 120 GB SSD. I ran this exact command, and you can find the output on https://pastebin.com/n6eujWm1.

OZ1SEJ commented 3 months ago

I've also uploaded the log file here, if that makes more sense: host.log Please let me know if there's anything more I can provide!

sairon commented 3 months ago

Unfortunately there was nothing helpful in the logs you provided (not your fault - it just wasn't persisted because of the kernel error) but now that we have more information about similar issues, it makes me believe it's another manifestation of #3368 - as your NUC also uses the Intel e1000e driver. So same goes for this issue - it should be fixed in the next release or latest dev, in the meantime you can only revert to 12.2 or switch to the dev channel (which I recommend to do only to test if it's fixed there and then switch back to beta/stable - while dev is currently "stable", it may break from day to day in the future).

OZ1SEJ commented 3 months ago

Thank you for your kind reply. It certainly looks like it's the same, underlying cause of these problems. I'll stick with the 12.2 for the time being, and see if it's fixed in the next release. Thanks again!

github-actions[bot] commented 1 week ago

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

OZ1SEJ commented 5 days ago

I ended up just skipping 12.3 and going directly to 12.4.