Closed Sholofly closed 2 years ago
@Sholofly I experienced the exact same problem. Do you happen to have AdGuard or Pihole running on the same machine? I noticed, when trying to ping anything within Home Assistant OS, it failed so I changed the DNS setting from within the terminal using nmcli. After saving the changes and rebooting, the OS reverted to 2022.8.5.
@Sholofly which version have you been using before?
@RobertD502 we have two boot slots, so we can revert back to the previously installed OS. If no successful boot attempt is recorded in three boots, the bootloader will revert to the previous version. The Supervisor marks a boot as good: So if starting Supervisor fails for some reason, it will revert to the previous OS version automatically.
FWIW, you can also manually select the previous version by selecting the other slot in the GRUB boot menu startup.
It seems Docker cannot access Internet. If there is already a Supervisor present, that should not be a problem, so I am a bit puzzled why the OS decided it needs to redownload the Supervisor.
Can you check the Supervisor service logs? You can redirect the output to a directory you have access to, e.g. your config directory:
journalctl -o cat -u hassos-supervisor.service > /mnt/data/supervisor/homeassistant/hassos-supervisor.log
You can also use the -b -1
argument to get the logs from the previous boot only. However, it might be that the issue was in a previous boot already.
As to why the system has no Internet access, can you check if DNS is working on the OS console?
resolvectl query registry-1.docker.io
@agners I was on latest stable (8.5). Because I did hit the update button while I didn't intend to on my production environment I created a new VM and removed the old one (which is stupid, because I could have used it to deliver some support here).
I know that there was no supervisor present. The only running container was the observer which stated that there was no container running.
@agners I was on latest stable (8.5). Because I did hit the update button while I didn't intend to on my production environment I created a new VM and removed the old one (which is stupid, because I could have used it to deliver some support here).
Ah I see :smile: So is your new VM on 8.5 again? Can you maybe create a snapshot/clone and try to update to 9.0.rc1 again?
So I looked at that from my side:
It seems Docker cannot access Internet. If there is already a Supervisor present, that should not be a problem, so I am a bit puzzled why the OS decided it needs to redownload the Supervisor.
It turns out that with HAOS 9.0.rc1 we redownload the Supervisor on first boot, because we use the GitHub Container registry now (see #2009). I need to see if we can do something about that.
So that explains this part. However, the other question remains: Why does your system has troubles accessing the container registry on 9.0.rc1? It seems that DNS resolving already failed. Could it be that this was a temporary Internet outage?
I can confirm that my system also only had the observer container.
I made the mistake of updating while I was away from home. Although I use NabuCasa for remote access, I can confirm that there was no internet outage as I was using my personal VPN at the time to access my home network.
I wasn't as smart and didn't have a backup of the HA VM in proxmox, so I was determined to get it back up and running remotely. Over the course of an 8 hour period I rebooted the VM way more than 3 times (it never reverted to 8.5 after 3 failed start ups). It wasn't until I tried to manually recreate the missing containers that I noticed HA OS couldn't reach anything externally. Once I changed the DNS from within HA OS to google's DNS instead of Adguard, it reverted back 8.5. Could it be that since I run Adguard as an add-on, and with that container not having started, HA OS was failing to attempt to fix itself?
I'll get you the desired logs once I'm back at my desk.
One thing I did notice is that, since this debacle, restarting Home Assistant takes a very long time (over 2 minutes) with every restart hanging. Previous restarts were usually sub 20s on i5 processor.
Edit: just timed a core restart and it took 3 minutes and 35s.
Edit 2: Supervisor service log below
The majority of this log is just not from me downgrading to 8.4 and then updating to 8.5 in hopes of getting rid of the long restart time, but that doesn't seem to be the case. Looks like I'll have to do a fresh install. However, what does seem to be common between @Sholofly and I is the error Get "https://ghcr.io/v2/": dial tcp: lookup ghcr.io: no such host
.
Core logs related to restart hanging:
Could it be that since I run Adguard as an add-on, and with that container not having started, HA OS was failing to attempt to fix itself?
Yeah that is exactly the problem: HAOS needs access to Internet at startup for several reasons: To set time (NTP server use DNS names as well, and time is being set before any add-on starts), to recover Supervisor in case the container image corrupts or is otherwise lost. Also when updating the Adguard add-on itself, the OS won't be able to resolve names which can lead to issues.
In the end, this is a chicken-egg problem. I highly recommend to not use ad-guard as DNS server for HAOS.
One thing I did notice is that, since this debacle, restarting Home Assistant takes a very long time (over 2 minutes) with every restart hanging. Previous restarts were usually sub 20s on i5 processor.
That seems to be a HA Core issue, I am not sure how/if that is related to the HAOS issue. Is that with the new DNS set? Did you upgrade HA Core as well maybe?
until I tried to manually recreate the missing containers that I noticed HA OS couldn't reach anything externally.
Hm, maybe that is the problem, the container needs quite specific settings. It shouldn't be done manually :sweat_smile: Now that the container is present again and DNS working on the host system, try removing it again manually and restart the OS service, which creates the container with correct paramters.
docker stop hassio_supervisor
docker rm hassio_supervisor
systemctl restart hassos-supervisor.service
Ah you misunderstood me. I never actually created the cli container. Attempting to was what revealed there was no connectivity. That is when I changed the DNS settings with nmcli. I also created a new VM just now and restored from my Home Assistant backup. I made sure to include my router as a secondary DNS prior to updating to the RC...that time it went without a hitch. However, still trying to determine what went wrong with Core as my restarts are timing out even on a fresh install (but restored a full HA backup during the onboarding).
That seems to be a HA Core issue, I am not sure how/if that is related to the HAOS issue. Is that with the new DNS set? Did you upgrade HA Core as well maybe?
The issue occurs no matter what the DNS is set to (Adguard, my router, or google). Doesn't seem to be the cause.
It started immediately after I was able to get HA OS back up and running yesterday. Prior to attempting the OS update, I was making some changes to my custom component (nothing that would result in breaking HA as it was just some changes to reflect deprecations in 2022.9) at which point I was restarting HA multiple times without a problem. Looks like I'll need to raise an issue in the Core repo as I'm having a hard time tracking down what is causing the lockup.
I made sure to include my router as a secondary DNS prior to updating to the RC...that time it went without a hitch.
Cool, thanks for the update. I'll also make a change to the OS so it won't need Internet to bring up Supervisor after that upgrade (see #2113). This saves a bit of bandwidth and would have helped in that particular case. But having a non-HAOS dependent DNS server is the right solution in this case.
@agners Appreciate the help!
HAOS won't attempt to redownload Supervisor with 9.0.rc2.
@Sholofly I'd recommend to check your DNS setup still, especially make sure that HAOS has a DNS server which does not depend on being up itself.
@agners Not sure what it is that fixed it in 9.0.rc2, but my HA core restart timeouts disappeared after I just updated to rc2.
Describe the issue you are experiencing
I've tried to update my HA OS to Home Assistant OS 9.0.rc1. After install the HassOS supervisor wont start and the system jumps into the emergency console. Error message:
Error reponse from deamon: Supervisor download failed trying: latest registry-1.docker.io: no such host
Looks like something is wrong with connection in docker but I lack the knowledge to debug that part. It worked until de last stable release.
What operating system image do you use?
ova (for Virtual Machines)
What version of Home Assistant Operating System is installed?
9.0.rc1
Did you upgrade the Operating System.
Yes
Steps to reproduce the issue
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System Health information
Can't access
Additional information
No response