home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
5.07k stars 992 forks source link

After Home Assistant OS startup/reboots, Alexa devices can't reach Home Assistant #2689

Closed mkanet closed 1 year ago

mkanet commented 1 year ago

Describe the issue you are experiencing

Approximately 1 year ago, something changed in Home Assistant OS which causes Alexa devices like Echo Dot to not communicate with Home Assistant entities, scripts, etc. (after Home Assistant Host reboots). NOTE: This issue does NOT occur after restarting Home Assistant Core. NOTE: Prior to a year ago, this issue did NOT exist!

If I say "Alexa, turn on Bathroom Lights", it will say "Device isn’t responding. Please check its network connection and power supply". It will do that for all Home Assistant entities.

Temporary Fix:

Restart Home Assistant VM via its Hypervisor: VMWare Workstation Player (with Bridged Networking).

Important:

This issue originally started on my old Win10 PC with Home Assistant VM on VirtualBox (with Bridged Networking) a little over a year ago. The same EXACT issue happens on a completely different Win11 PC hardware and different Hypervisor: VMWare Workstation Player (with Bridged Networking)

Important 2:

No other communication issues whatsoever:

What operating system image do you use?

ova (for Virtual Machines)

What version of Home Assistant Operating System is installed?

Home Assistant OS 10.4

Did you upgrade the Operating System?

Yes

Steps to reproduce the issue

Reboot Home Assitant Host via Home Assistant Webpage or upgrade Host OS to a new version with a subsequent reboot.

Anything in the Supervisor logs that might be useful for us?

No error at all in Supervisor Log.  Only INFO logs.

Anything in the Host logs that might be useful for us?

No relevant errors whatsoever with the default log settings.  In fact, Home Assistant can run for many days without any notable log messages. No relevant errors.

Please let me know what debug log settings I should use that would help you identify the issue to provide a solution/fix.

System information

version core-2023.8.1
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.4
os_name Linux
os_version 6.1.39
arch x86_64
timezone America/Los_Angeles
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.32.1 Stage | running Available Repositories | 1272 Downloaded Repositories | 19
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | July 28, 2024 at 5:00 PM relayer_connected | true relayer_region | us-east-1 remote_enabled | false remote_connected | false alexa_enabled | true google_enabled | true remote_server | us-east-1-6.ui.nabu.casa certificate_status | ready can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 10.4 -- | -- update_channel | stable supervisor_version | supervisor-2023.08.1 agent_version | 1.5.1 docker_version | 23.0.6 disk_total | 30.8 GB disk_used | 7.8 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | Terminal & SSH (9.7.1), Samba share (10.0.2), Mosquitto broker (6.2.1), File editor (5.6.0), Z-Wave JS (0.1.87)
Dashboards dashboards | 1 -- | -- resources | 19 views | 7 mode | storage
Recorder oldest_recorder_run | August 9, 2023 at 12:00 AM -- | -- current_recorder_run | August 14, 2023 at 7:42 PM estimated_db_size | 799.40 MiB database_engine | sqlite database_version | 3.41.2

Additional information

I originally opened this issue with Alexa Media Player Github. However, they said that the issue isn't related to Alexa Media Player. No response

agners commented 1 year ago

I am not very familiar with Alexa, do you know how it communicates with Home Assistant exactly? Are the two devices in the very same (L2) network? Do you have IPv6?

I've had someone with connectivity issue recently and it was related to promiscuous mode on the host side. Unfortunately I can't find the issue/discussion anymore. It wasn't clear why that helped, but it did in that particular case.

Update: Found it, it was this issue https://github.com/home-assistant/operating-system/issues/2549#issuecomment-1555179645.

mkanet commented 1 year ago

@agners thanks for the reply back. I can confirm that my Home Assistant VM automatically gets an IPTV4 address (which never changes) and an IPTV6 address. Also, I'm pretty sure I have promiscuous mode enabled on this VM.

Just to be clear, Alexa CAN communicate with my Home Assistant setup perfectly/consistently/indefinitely. However, it requires me to reboot Home Assistant via Hypervisor only for it to communicate. If I reboot Home Assistant OS via Home Assistant's web interface, the connectivity will be lost until I reboot from the Hypervisor again.

Also, this issue isn't specific to just VMWware Workstation Player Hypervisor. It also happens with VirtualBox 6.x Hypervisor. I switched to VMWare Workstation Player 17 in hopes the problem would go away.. but, it never did. So, either there is an issue with 2 different hypervisors or there is an issue with Home Assistant OS itself.

Also, it is important to note that there are absolutely no other network-sensitive components in my Home Assistant setup that have network connectivity issues, MQTT, Samba/SMB, Nabu Casa synching, Z-Wave, Zigbee, etc all work consistently (in both directions) regardless if I reboot via Home Assistant or via Hypervisor. Also, my Home Assistant setup includes several integrations that talk to API endpoints outside my network without any issues no matter what I do.

PS: This issue was introduced a little over a year ago. I always configured my VirtualBox and VMWare hypervisors with promiscuous mode; even over a year ago when everything worked correctly. My network settings never changed; and, relatively simple. I only have a single physical network adapter on my Windows Host PC.

agners commented 1 year ago

Just to be clear, Alexa CAN communicate with my Home Assistant setup perfectly/consistently/indefinitely. However, it requires me to reboot Home Assistant via Hypervisor only for it to communicate. If I reboot Home Assistant OS via Home Assistant's web interface, the connectivity will be lost until I reboot from the Hypervisor again.

This is very curious. We initiate a regular reboot though the systemd init manager. In the end, I expect that the Linux kernel initiates a a proper system reset/reboot through typical x86-64 (I assume a UEFI ResetSystem, or cold boot).

Is this a 100% reproducible? E.g. if you do 10 reboots via HA it won't work 10 times, and if you reboot 10 times via VM reboot mechanism, it will work every time? I'd be glad if you could actually run a couple of reboots just to be sure that this is indeed the case.

From a Home Assistant perspective, we completely reboot either way, and initialize all services from scratch.

So the difference must come from the Hypervisor. Is this the only virtual machine running on your system? I know that some Hypervisor only initialize promiscuous mode "on-demand". So a reboot/reset through the Hypervisor might trigger a network configuraiton reinitialization on the host side, or something along these lines. But this is all quite some speculation.

PS: This issue was introduced a little over a year ago. I always configured my VirtualBox and VMWare hypervisors with promiscuous mode; even over a year ago when everything worked correctly. My network settings never changed; and, relatively simple. I only have a single physical network adapter on my Windows Host PC.

The question is what changed back then... Maybe it was on Alexa side, host side (Windows/Windows driver update?) or indeed HAOS. If it is the HAOS, you can relatively easily downgrade: Use ha os update --version x.y. This operation is typically fairly safe, but just in case I'd suggest to take a snapshot/make a copy of your VM :sweat_smile:

mkanet commented 1 year ago

The question is what changed back then... Maybe it was on Alexa side, host side (Windows/Windows driver update?) or indeed HAOS. If it is the HAOS, you can relatively easily downgrade: Use ha os update --version x.y. This operation is typically fairly safe, but just in case I'd suggest to take a snapshot/make a copy of your VM 😅

Hi @agners thank you so much for replying back. I can confirm that this issue is 100% reproducable. I can restart Home Assistant Core >20 times in a row (while Alexa CAN reach Home Assistant).. no issues whatsoever. As soon as I restart Home Assistant Host, Alexa will not be able to reach Home Assistant. However, everything else works fine on Home Assistant. The ONLY way to get Alexa to communicate with Home Assistant again is to restart Home Assistant from the Hypervisor. I even made an automation to restart the Home Assistant VM via VMware.

Please Note: Just to be clear, this issue happened even on my old Win10 PC too from over a year ago on Oracle VirtualBox Hypervisor. In a desperate attempt to fix the issue, I built a completely new Win11 PC; used a different Hypervisor, VMWare Workstation Player, and installed Home Assistant VM from scratch. After using completely different PC hardware, a different OS, a different Hypervisor, and a clean install of Home Assistant VM, same issue is there. There isn't any more I could have done.

PS: The only thing both Hypervisors have in common, other than Home Assistant, is both Hypervisors are configured for bridged networking and promiscuous mode is enabled; and they were on the same LAN.

PPS: I've only had 1 VM, Home Assistant, (on both my old PC and brand new PC).

I don't know enough about how Home Assistant OS interacts with its Hypervisor. Hopefully, this information can help you narrow down where the issue is. Please let me know what debug logs I can provide or anything else I can do to help.

agners commented 1 year ago

I can restart Home Assistant Core >20 times in a row (while Alexa CAN reach Home Assistant)..

Yeah this kinda make sense, as this would not touch network configuration.

Is soon as I restart Home Assistant Host, Alexa will not be able to reach Home Assistant.

and

The ONLY way to get Alexa to communicate with Home Assistant again is to restart Home Assistant from the Hypervisor.

This seems somewhat strange, as I said before, technically, from a HAOS point of view, there is absolutely no difference between this two restart modes.

The only difference could be within the Hypervisor. That is why I feel this is Hypervisor related.

Did you also compare this restart modes 10 times, to be sure it is always reproducible (e.g. restart HA host through "Reboot System" in the frontend, confirm Alexa stops working, restart through VMware, confirms Alexa works, and repeat this a couple of times).

Can you maybe also compare the IPv4 and IPv6 information between reboots, and see if there is a pattern (e.g. do the IPv4 and IPv6 addresses stay the same or do they change)?

mkanet commented 1 year ago

The only difference could be within the Hypervisor. That is why I feel this is Hypervisor related.

If it is the Hypervisor, it is happening on 2 completely different Hypervisor applications. Oracle VirtualBox and VMWare Workstation Player

Did you also compare this restart modes 10 times, to be sure it is always reproducible (e.g. restart HA host through "Reboot System" in the frontend, confirm Alexa stops working, restart through VMware, confirms Alexa works, and repeat this a couple of times).

After I selected "Reboot System", I was able to make the issue happen. Hence, I subsequently rebooted HA via Hypervisor... then, when I tried to do it again... the issue didn't happen for the first time in over a year! I still can't believe it.

I don't know what I did to fix it. At least, I can't reproduce the issue currently even when I choose REBOOT HOST, in the screen below: image

What is the difference between "Reboot System" and "REBOOT HOST" in the above 2 methods? Or are they the same? In any case, the issue appears to be gone for now.

The only other thing I did was take screenshots below of my Internet Router's IPTV6 settings and HA's IPTV6 settings. Maybe that triggered it into working correctly?

Can you maybe also compare the IPv4 and IPv6 information between reboots, and see if there is a pattern (e.g. do the IPv4 and IPv6 addresses stay the same or do they change)?

Whether the issue is happening or not, the IPV6 IP address, Gateway Address, DNS Servers, stay the same. However, in both cases, Gateway Address is null and DNS Servers no value:

Settings/System/Network/IPv6:
IPV6: fe80::432a:8b1f:1813:c201/64 Gateway Address = null DNS Servers=

NOTE: IPTV6 on my Windows 11 Host PC, has IPTV6 fully enabled.

QUESTION: Is promiscuous mode on both Hypervisors causing this? I don't even know what that does. I just know that in some online tutorials, they suggest turning it on for HA.

QUESTION 2: (IMPORTANT?) I completely turned off my primary Internet Router's IPTV6 setting a while ago because my Google Nest Mini would lose its connectivity after a while with it enabled (known Google Next Mini bug). What are the chances this is causing the issue? If it is causing the issue, which IPTV6 settings should I choose below? I'm guessing either Native or Passthrough?

image

image

image

mkanet commented 1 year ago

@agners I had restarted Home Assistant so many times while testing this that I think I corrupted one of my containers. I wasn't able to upgrade Home Assistant even after restoring from backup.

I installed a clean copy VMWare VM and restored from my lastest Home Assistant backup. Guess what? The problem is back.

I think I found an important clue. I'm getting the below message sometimes after I restart home assistant. It seems to usually show while the issue is happening. I see this below message even after I installed a brand new VMware VM with clean containers:

23-08-24 19:32:04 WARNING (MainThread) [supervisor.homeassistant.core] Watchdog found Home Assistant failed, restarting...

Do you have any suggestions on what I can try next?

mkanet commented 1 year ago

@agners I'm going to close this issue. I have a feeling this isn't something you'll be able to help with.

mkanet commented 1 year ago

Hi just wanted to say I finally figured out this issue!!

This was one of the hardest issues for me to troubleshoot. I finally realized that the issue was caused by my NGINX reverse proxy code for my Home Assistant VM instance. It was never configured correctly so the HomeAssistant Alexa Skill was having difficulty reaching my Home Assistant end-point. After I setup the correct config (shown below) and disabling/re-enabling the HomeAssistant Skill on the Alexa app, it all started working correctly for the first time in over a year!

# This redirects HTTP to HTTPS
server {
    listen 80;
    server_name server.address.com;
    return 301 https://server.address.com$request_uri;
}
# The HTTPS configuration with certificates from letsencrypt
# Without HTTPS you will be unable to connect for example Google Home
server {
    listen       443 ssl;
    server_name  server.address.com;
    ssl on;
    ssl_certificate /etc/letsencrypt/live/server.address.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/server.address.com/privkey.pem;
    ssl_prefer_server_ciphers on;

    location / {
        proxy_pass http://localhost:8123;
        proxy_set_header Host $host;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";     
        proxy_set_header   X-Real-IP          $remote_addr;
        proxy_set_header   X-Forwarded-Proto  $scheme;  
        proxy_set_header   X-Forwarded-For    $proxy_add_x_forwarded_for;
    }
    location /api/websocket {
        proxy_pass http://localhost:8123/api/websocket;
        proxy_set_header Host $host;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header   X-Real-IP          $remote_addr;
    proxy_set_header   X-Forwarded-Proto  $scheme;
    proxy_set_header   X-Forwarded-For    $proxy_add_x_forwarded_for;
    }
    location /auth/external {
        proxy_pass http://localhost:8123/auth/external;
        proxy_set_header Host $host;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header   X-Real-IP          $remote_addr;
        proxy_set_header   X-Forwarded-Proto  $scheme;
        proxy_set_header   X-Forwarded-For    $proxy_add_x_forwarded_for;
    }
}