home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.84k stars 965 forks source link

Reboot Fails after OS Update boot screen stuck at Slot A (OK=1 Try=3) and flickers. #3153

Open Cardy165 opened 8 months ago

Cardy165 commented 8 months ago

Describe the issue you are experiencing

This issue has ocurred for a number of updates. When I update the HASS OS the update applies and the machine reboots but the machine doesn't come back up.

Upon connecting a console I will see the line below with either Slot A or Slot B

*Slot A (OK=1 TRY=3)

The light is highlighted and flickering like its trying again and again.

If I plug in a keyboard the line stops flickering.

If I then press enter the system boots briefly showing the line "unsuitable video mode found booting in blind mode" and then start normally on the new version.

If I select Slot B (which isn't shown unless I move down with the keyboard) then the previous version of the OS boots.

This happens on every update cycle with me sometimes having to apply the updates multiple times before I can get a stable system that I am confident will reboot without intervention.

What operating system image do you use?

generic-x86-64 (Generic UEFI capable x86-64 systems)

What version of Home Assistant Operating System is installed?

11.4 going to 11.5

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. On 11.4 click on the install button for HASS OS 11.5
  2. Wait for the machine to reboot, subsequently plugin screen and keyboard and the Slot issue shown above will be happening.
  3. Choosing either Slot option normally allows the system to boot. ... 20240206_140932

Anything in the Supervisor logs that might be useful for us?

Nothing to do with OS boot

Anything in the Host logs that might be useful for us?

Nothing to do with OS boot

System information

System Information

version core-2024.1.6
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.6
os_name Linux
os_version 6.1.74-haos
arch x86_64
timezone Europe/London
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4801 Installed Version | 1.34.0 Stage | running Available Repositories | 1396 Downloaded Repositories | 36
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | 12 February 2024 at 00:00 relayer_connected | true relayer_region | eu-central-1 remote_enabled | true remote_connected | true alexa_enabled | true google_enabled | true remote_server | eu-central-1-11.ui.nabu.casa certificate_status | ready instance_id | 0b159739d197407aa8df95369be0fc78 can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 11.5 -- | -- update_channel | stable supervisor_version | supervisor-2024.01.1 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 457.7 GB disk_used | 39.6 GB healthy | true supported | true board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | File editor (5.7.0), Home Assistant Google Drive Backup (0.112.1), Log Viewer (0.17.0), MariaDB (2.6.1), Mosquitto broker (6.4.0), Samba Backup (5.2.0), Terminal & SSH (9.8.1), Studio Code Server (5.15.0), Advanced SSH & Web Terminal (17.1.0), TasmoAdmin (0.29.1), UniFi Network Application (3.0.3), Glances (0.21.0), Frigate (0.13.1), OneDrive Backup (2.2.4), HA Smart HD Monitor (0.1), Bookstack (2.0.1)
Dashboards dashboards | 2 -- | -- resources | 21 views | 20 mode | storage
Recorder oldest_recorder_run | 1 February 2024 at 09:34 -- | -- current_recorder_run | 6 February 2024 at 14:19 estimated_db_size | 2179.11 MiB database_engine | mysql database_version | 10.6.12
Sonoff version | 3.5.4 (a4a8c5f) -- | -- cloud_online | 0 / 1 local_online | 0 / 0

Additional information

The system runs on: M70q Gen 2 Desktop (ThinkCentre) with the OS installed directly onto an NVME drive. The machine is dedicated to Home assistant and does not utilise proxmox.

sairon commented 8 months ago

This looks very much like #3112 which was caused by missing CONFIG_FRAMEBUFFER_CONSOLE. However, this issue has been fixed, so I wonder what else could it be. In the end, even though the HDMI output is stuck, the system should boot and you should be able to reach HA. Can you please try booting into the slot A again, waiting a while and check if you can reach or ping the address of your Home Assistant?

Cardy165 commented 8 months ago

I have gone to

Settings -> Hardware -> Restart Home assistant -> Advanced options -> Reboot System

The system has rebooted and has come back up and is running again. I am currently on

Core: 2024.1.6 Supervisor: 2024.01.1 Operating System: 11.5 Frontend: 20240104.0

I have done this twice and the machine reboots OK. The issue seems to be the restart after an update. When that happens the machine gets this weird boot issue and I have to manually go select the Slot A before it will boot otherwise I get the error above with the boot option for Slot A just flickering.

The machine display output is not normally connected, the above 2 reboots were performed without a screen connected. I connected a standard monitor (15 pin D-Type) when I had the slot A issue., the machine has that and a Display Port connector if that makes a difference.

Cardy165 commented 7 months ago

I had the same issue again going to Core 2024.2.1

Just tried a normal reboot and the machine again is stuck at Slot A OK=1 Try=1

Seems to not only be after updates. Is there a way to diagnose this issue or any more information on how the Slot A & B is setup ?

github-actions[bot] commented 4 months ago

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

Cardy165 commented 4 months ago

This issue still exists, it happens usually when there is a HASS OS update sometimes on a reboot. I have to go press enter to make it continue.

I have no way to diagnose what is the cause I have checked everything I can and have applied all the latest updates.

Highly frustrating as I can't ever risk a reboot unless I am close to the machine as I may have to plug a keyboard in to make it boot.

driversti commented 2 months ago

I have the same issue on the HP T620 Thin Client. It is dedicated to HAOS. Because of this issue, I cannot update the system when I am not at home.

Cardy165 commented 2 months ago

I still have the issue, pressing enter seems to boot normally I just need to know how to diagnose it further.

maylay commented 2 months ago

Issue is still present on HAOS 12.4 core 2024.7.3. HP t620 firmware L40 v02.19

maylay commented 2 months ago

Issue is still present on HAOS 12.4 core 2024.7.3. HP t620 firmware L40 v02.19

The OS booted back to 12.3, after updating again the issue seems to have gone away. Not familiar with how A/B systems work, sorry if this was trivial.

Cardy165 commented 1 month ago

Still having this issue. Had it on a completely new HA machine that I built for someone else.

When it was at the boot screen it showed just 1 line for Slot A

When I pressed Enter it flashed a message saying something like "No suitable video mode found/available" Then said something like booting in buffered video mode or something.

The machine does not normally have a screen on it, I wonder if when there is no screen the lack of a monitor causes some problem. It would need someone who understands the slot A & B thing to be able to shed more light on it though.

sairon commented 1 month ago

@Cardy165 Can you share the hardware details of the machine? For @maylay the issues with HP T620 should be resolved with the latest OS 13.0.

Cardy165 commented 1 month ago

Hi @sairon, Sure the machines I am having this issue on are Lenovo M70q Gen 2 - Type 11MY Tiny PC's. The machine is running the latest firmware.

I have setup a second machine for someone else and they are also having the issue. I took some extra screen shots and video of the issue yesterday.

The screen shows the line slightly flickering like its trying to do something. I have uploaded 2 videos, in one you can see only the option for slot A is shown pressing enter causes it to boot fine. Pressing Enter causes the machine to boot after showing an error about video modes.

The machines normally run screen less and keyboard less. I have tried putting a dummy monitor plug in but the issue persists.

https://github.com/user-attachments/assets/d199da62-5b14-4e2b-9282-69fb63378f88

https://github.com/user-attachments/assets/7d128dc4-04fe-4334-96cb-9ef822b25371

The second video is the same till I press e, then escape which then shows all the options, again pressing enter it boots normally.

Cardy165 commented 1 month ago

@sairon If there is any more information you need or anything you need me to test please let me know

a4mkd9 commented 1 month ago

Yesterday had the same issue while upgrading from 12.4 to 13.0 on HP t520 Thin Client with 4GB RAM and SATA SSD. This system had 12.0 installed several months ago and I did at least 3 or 4 updates without any issue until yesterday...

Tried SLOT A and B, also recovery console - nothing boots. I have backups on Google Drive - didn't try the restore procedure yet. Just in case - booted live linux and backing up drive contents to my NAS.

Is there anything I could get from the logs that would help figuring out the reason of this issue?..

UPDATE

Spent some time today troubleshooting this issue:

  1. Tried a clean install of HAOS 13.1 on the same hardware - doesn't boot past Grub Menu
  2. Tried clean install of HAOS 12.4 on the same hardware - works as expected (made a backup of /dev/sda1 at this point)
  3. Restored from the backup --> Reboot - works as expected
  4. OS update from 12.4 to 13.1 --> Reboot - doesn't boot past Grub Menu
  5. Booted Live Linux, replaced /dev/sda1/.../boot*.efi files with the ones from HAOS 12.4 --> Reboot - everything works as expected.

So it seems my issue is related to https://github.com/home-assistant/operating-system/issues/3305 and it turns out it affects AMD CPUs also...

Attaching dmidecode and "smbios --type 4 --get-qword 8" output : 32386119686159359

dmidecode.txt

Cardy165 commented 1 month ago

Let me know if you want the dmidecode from the Lenovo machine if that will help diagnose the issue.

sairon commented 1 month ago

@a4mkd9 Your conclusions look correct (for completeness you could also try 12.2 which contains GRUB 2.12 without any patches) - I expect it will fail too. But it is different from @Cardy165's issue, it was reported before the GRUB update. I have created a new issue for yours.

@Cardy165 When reading through everything again, one thing I don't understand - is this something that worked correctly and started to misbehave at some point or was it like this with every update in the past? Originally I thought it's a regression introduced in 11.5 but now I'm not sure.

Cardy165 commented 1 month ago

Hi @sairon , thanks for taking the time to look at this.

I used to run this on an old Lenovo M92 or M93, never had a problem from what I can remember. Once I switched to this machine this has been an issue for almost every update.

Initially I thought it was only related to updates, but it sometimes happens when I tell the host to reboot. I setup another machine for another family member but that machine has the exact same issue as my one. That machine was built cleanly with the latest available version of home assistant generic x86-64 image on Monday 12 August 2024. To me this would indicate its something specific about this machine/BIOS. As the second machine is remote I only reboot it when I am there as I can't be sure it will come back up.

The 2 machines I am running on are the same physical format (Lenovo Tiny) and are the same Specifications, Intel i5 11400T 16GB RAM, except for the boot drives where one is a 256G and the other a 500G NVME. Both are fully updated according to Lenovo's update tools.

When I first had the machine it did freeze a lot but that was due to Intel Dynamic Platform and Thermal Framework Driver being enabled in the BIOS and because the Home assistant image doesn't include the thermald daemon to manage it the machine wouldn't react to temperature changes properly.

Once I figured that out and disabled DPTF in the BIOS I've had no other issue with the machine other than the rebooting problem.

Sometimes the machine will reboot normally, but it always seems to exhibit the problem when doing an OS update.

If you need anything to diagnose then let me know what you need, I would love to get the issue resolved.

Dajestar commented 1 month ago

I have the same issue with an HP T610 with 8 gigs of RAM running a clean install of 13.1. Installing an older version (12.4) works fine but any upgrade leads to the issue mentioned here.

blckmn commented 1 month ago

Can confirm this issue is happening for me also. Hard reboot and the machine boots fine. Never used to be a problem, since the last 3 or so os updates its failed to boot... requiring manual intervention.

Cardy165 commented 1 month ago

Just applied the latest update, still have the same problem plug in a keyboard and press enter and the machine boots. If I can do anything to help diagnose let me know what you need.

jwoodard80 commented 4 days ago

I have had this issue for quite sometime and dread OS updates now due to the extra time needed. Initially I thought it was due to it being a VM for some reason so I moved the installation to a stand-alone box (Zotac CI325 Nano with 128GB SSD)

That said,

I appreciate any work toward this and I'm happy to help do anything needed to troubleshoot this issue.

Cardy165 commented 2 days ago

I continue to have this issue, it always happens on upgrade, sometimes on reboot.