Closed tybo611 closed 1 month ago
Can you please try reproducing it again and sharing logs from Supervisor and the host? Once the system boots to CLI, type login
and run the following commands:
ha host logs -n1000 -b0 >> /mnt/data/supervisor/homeassistant/haos-3517.txt
ha su logs -n1000 -b0 >> /mnt/data/supervisor/homeassistant/haos-3517.txt
Once you boot back to the working system, your config folder (i.e. the folder that contains configuration.yaml
) should contain file haos-3517.txt
- please check it for errors or share it here as whole.
Eventually you can switch between the two OS versions without running the update using ha os boot-slot other
(just check with ha os info
that the boot slots contain the version you want).
I don't think there's anything useful, do I need to change log level? It's now set up so I can boot back and forth between the two releases. It's also non-production version so happy to test anything I can to help.
haos-3517.txt ![Screenshot 2024-08-12 191325](https://g ithub.com/user-attachments/assets/5b991c8b-b9aa-4c70-85e1-8fafe1c70835)
This is message that stays up when starting 13.0 and eventually times out, won't allow any HA commands, but can login for shell commands.
Sorry, I haven't realized that without Supervisor fully started you wouldn't be able to run ha logs ...
. Please run the following instead (after login
command):
rm -f /mnt/data/supervisor/homeassistant/haos-3517.txt
dmesg >> /mnt/data/supervisor/homeassistant/haos-3517.txt
docker logs -n1000 hassio_supervisor >> /mnt/data/supervisor/homeassistant/haos-3517.txt
Also, can you ping 192.168.20.125
?
ping returns the client is alive haos-3517.txt
attached are the logs associated with commands.
Experiencing the same issue. Unfortunately Network Share connected to my Synology NAS broke
× mnt-data-supervisor-mounts-smb_backups.mount - Supervisor cifs mount: smb_backups
Loaded: loaded (/run/systemd/transient/mnt-data-supervisor-mounts-smb_backups.mount; transient)
Transient: yes
Active: failed (Result: exit-code) since Wed 2024-08-14 14:21:24 UTC; 1min 45s ago
Where: /mnt/data/supervisor/mounts/smb_backups
What: //ad6.zbraslav.lan/homeassistant
CPU: 14ms
Aug 14 14:21:23 homeassistant systemd[1]: Mounting Supervisor cifs mount: smb_backups...
Aug 14 14:21:24 homeassistant mount[23620]: mount error(128): Key has been revoked
Aug 14 14:21:24 homeassistant mount[23620]: Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)
Aug 14 14:21:24 homeassistant systemd[1]: mnt-data-supervisor-mounts-smb_backups.mount: Mount process exited, code=exited, status=32/n/a
Aug 14 14:21:24 homeassistant systemd[1]: mnt-data-supervisor-mounts-smb_backups.mount: Failed with result 'exit-code'.
Aug 14 14:21:24 homeassistant systemd[1]: Failed to mount Supervisor cifs mount: smb_backups.
@mazzy89 That doesn't seem related, it rather looks like an ACL issue: mount error(128): Key has been revoked
The dmesg
output or checking the NAS logs might give you more details. It doesn't look like a problem with network at this point.
@tybo611 I wonder if it could be this issue: https://bugzilla.kernel.org/show_bug.cgi?id=219129 On what OS/machine is the SMB server running? Eventually, today's dev will contain a fix for that (it was a stable kernel regression fixed in 6.6.46 released literally few minutes ago).
@sairon it started exactly after the upgrade. Never had such issues before and the NAS is perfectly up and running.
@mazzy89 Please try if it is indeed a regression and 12.4 works correctly. You can use ha os boot-slot other
to swap back and forth between versions.
Anyway, I'm not saying the issue is not there, it just manifests in a different way (OP's SMB server is simply unreachable), so please don't mix it up here and open another issue.
I have the same issue - I am running HAOS in a VM on Proxmox on Intel NUC with some NFS mounts (from QNAP NAS). Updating to this OS broke HA starting and symptoms are same as reported in first post here. I managed to fix it by reboot and typing "ha banner" (weird - but it works https://community.home-assistant.io/t/error-returned-from-supervisor-system-is-not-ready-with-state-setup/413084/124). However, HA is showing HA OS 13.0 available for update. I am going to wait and watch this issue here.
I have the same issue - I am running HAOS in a VM on Proxmox on Intel NUC with some NFS mounts (from QNAP NAS). Updating to this OS broke HA starting and symptoms are same as reported in first post here. I managed to fix it by reboot and typing "ha banner" (weird - but it works https://community.home-assistant.io/t/error-returned-from-supervisor-system-is-not-ready-with-state-setup/413084/124). However, HA is showing HA OS 13.0 available for update. I am going to wait and watch this issue here.
I'm guessing you actually booted into the other partition as @sairon mentioned and I've been doing to check. after the CLI is loaded and on the main screen type os info, does yours look similar to this where you have a bad boot partition and you have booted from the 12.4 partition.
@mazzy89 That doesn't seem related, it rather looks like an ACL issue:
mount error(128): Key has been revoked
Thedmesg
output or checking the NAS logs might give you more details. It doesn't look like a problem with network at this point.@tybo611 I wonder if it could be this issue: https://bugzilla.kernel.org/show_bug.cgi?id=219129 On what OS/machine is the SMB server running? Eventually, today's dev will contain a fix for that (it was a stable kernel regression fixed in 6.6.46 released literally few minutes ago).
Similar to @jdesai61, I'm also running a VM in proxmox, intel 8th gen additional details below. using a windows 11 laptop to access HaOS.
I can try to upgrade the bad partition to the new dev release and see what happens. I'll get to it tonight and provide error logs or positive outcome.
@tybo611 anxious to see if this is successful. i had the same issue as you preventing the Supervisor from starting:
194.0571611 CIFS: UFS: 1\192.168.20.125 has not responded in 180 seconds. Reconnecting...
with a different IP though. Unfortunately, i did not know about ha os boot-slot other
and restored a day-old VM backup - i basically lost a day's worth of data.
Unless I'm missing an easier method(aside from waiting for release), I'll have to change some settings and self sign the dev build. Sound right?
@tybo611 Thanks for the effort - there should have been a dev release available for couple of hours already. However, something's wrong at Cloudflare and it simply refuses to serve the raucb image which is needed for OTA. With that, it would be a matter of a single HA CLI command to update to that version (ha os update --version 13.1.dev20240814
). Most likely some caching issue I can not resolve myself :cold_sweat:
If you want to go down the rabbit hole, there's a way to build your own OS build but as it's a VM, it might be easier to create a new one from the latest dev image, setup a share and see if it fails there as well, and if not, run ha os update --version 13.0
to downgrade to 13.0 to confirm it was indeed a kernel regression fixed by today's Linux release.
Eventually, you can also try downgrading to older dev builds - use the valid versions listing from dropdown at the artifacts page. I'm particularly interested if 13.0.dev20240802
contains the issue or not (for this you can simply run ha os update --version 13.0.dev20240802
). It should have kernel 6.6.43 which doesn't contain backported commit net: missing check virtio
yet.
now that git has recovered. I'm able to download the newest dev environment.
@sairon, appreciate walking through these, I haven't used dev builds before but enjoying the learning with it. updated the OS to Aug14 build and it booted successfully and the network share is still attached and accessible.
is it worth going to the Aug02 build to check that kernel or are we good with the new 6.6.46? admittedly I can't remember which build exactly i started having issues as i upgraded to one of pre-releases quickly and realized the issue rolled back and started trying to see if it was something in my setup specifically.
@tybo611 Thank you for checking! That's good news, that means the kernel bump helped and it was probably the GSO issue above that caused the issues. Checking Aug 02 build would just help to confirm that the regression was introduced in 6.6.44, it could give us some assurance but it's not really needed.
I have the same issue - I am running HAOS in a VM on Proxmox on Intel NUC with some NFS mounts (from QNAP NAS). Updating to this OS broke HA starting and symptoms are same as reported in first post here. I managed to fix it by reboot and typing "ha banner" (weird - but it works https://community.home-assistant.io/t/error-returned-from-supervisor-system-is-not-ready-with-state-setup/413084/124). However, HA is showing HA OS 13.0 available for update. I am going to wait and watch this issue here.
I'm guessing you actually booted into the other partition as @sairon mentioned and I've been doing to check. after the CLI is loaded and on the main screen type os info, does yours look similar to this where you have a bad boot partition and you have booted from the 12.4 partition.
Mine boots - it just couldn't start HA core for some reason - with same error message as OP mentioned. However,I can login to it using Console on Proxmox web gui. So perhaps I have a slightly different problem.
@sairon, can confirm the 02Aug build also works. I have the VM now with 02Aug and 14Aug builds both booting back and forth, starts immediately and has no issues with network share.
By accident, I issued "ha os update" command, which ended up re-installing 13.0 and now I can't get HA to startup. Even "ha os info" just hangs. How do I try rebooting from Slot A vs Slot B?
you're running proxmox right? shutdown the VM, it'll fail a few things but eventually shutdown. then from Proxmox console webpage for the VM, start it up and hit a key while it's in the loading phase. that will stop the process at the boot menu and you'll be able to select the other slot. helps if you know which one you were booting from but you should be able to base it off the "tries" number.
Ok I managed to reboot into right slot with "ha os boot-slot B" command and now it gets further. But after boot, HA core won't start (I get "Error: System is not ready with state: setup"). However, if I type "ha banner" - then all is well and core starts. How can I get it to start core automagically?
My thoughts, though i'm not an expert: if your booted, now, why not run the update command with specific version to downgrade to the 12.4 release that (assuming) you had no issues with.
think it'd be something like the ha os update --version 12.4
command. that should overwrite the bad boot slot as well and you will boot into that, once the new 13.1 build comes out you can push that via normal OTA. you could just hit skip on the 13.0 build if you wanted to not see the message all the time.
Thanks - will do
I have the same issue since restarting HA this morning, but was able to boot to the other slot to get working again. How do I clean up the broken slot so I don't end up with it remaining after eventually upgrading to a fixed version? I wasn't even able to run 'os info' or anything else when booted into 13.0. just got the 'state: setup' message
HA won't boot to the other slot again until it's told too during update or by command. If you want two working slots, you can run the update command to version 12.4(command above) or one of the dev versions confirmed to be working(command also above); the update will write to the inactive slot. Otherwise wait until the new release is made and it will override the bad slot then.
If you want two working slots, you can run the update command to version 12.4(command above) or one of the dev versions confirmed to be working(command also above); the update will write to the inactive slot.
Yes I got that but how? It won't let me:
Use this instead... ha os update --version 13.1.dev20240802
It's a dev build but I believe it was mentioned that wasn't much different than stable build. Once it's booted you can issue the command to boot to other slot again and you be back on the 12.4. Then when update happens it will override dev build
There are comments in this other issue regarding installing a dev build, you can't do it while set to the stable channel - https://github.com/home-assistant/operating-system/issues/3528
I just noticed in that pic of trying to get 12.4 in both slots, apparently failed but it now says "version_latest: 12.4" instead of 13.0.......?? And "update_available: false" instead of true...
i have the same problem and opened a bug report for it. Maybe someone can merge it into this bug (https://github.com/home-assistant/operating-system/issues/3524)?
I solved my supervisor not starting issue with "supervisor reload" after 1 or 2 minutes and then i removed the Network Folder that i used for backups and after another reboot the system is running and starting normal.
Using the network folder (NFS Share on a Qnap NAS) again and the supervisor did not start and had to reloaded manually.
With 13.1 released are you still seeing the issue?
I tested it and the problem seems to be gone. did two restarts and the supervisor starts in less than a second. Thanks 👍
I can confirm 13.1 is working well for me too, and that it also replaced the broken slot, kept the working 12.4 and switched:
The root cause of the issue have been resolved by the kernel update in 13.1 released yesterday. If anyone's having issues looking similar to this one, it's likely something else, please open a new issue with complete description in that case.
Describe the issue you are experiencing
Upgraded from 12.4 to beta, and once installed fails to boot, only gives error "system not ready state: setup", with multiple errors for the network share that I had setup for backups. Managed to recover from an old backup, remove the network share and the upgrade succeeds but adding the network share back and restarting all errors return, cannot start supervisor, cannot see any of the major logs on CLI the only error is "system not ready state: setup". I'm aware it could be something I have wrong with my setup specifically, but would like to figure out what logs I can access to see what's going.
What operating system image do you use?
generic-x86-64 (Generic UEFI capable x86-64 systems)
What version of Home Assistant Operating System is installed?
13.0RC2
Did the problem occur after upgrading the Operating System?
Yes
Hardware details
Proxmox, VM
Steps to reproduce the issue
1.Upgrade from 12.4 to 13 with network share attached
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System information
No response
Additional information
No response