home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.16k stars 29.83k forks source link

Google Cast Integration not detecting Cast Devices #43652

Closed N5A closed 3 years ago

N5A commented 3 years ago

The problem

Google Cast devices not detected by HA when adding cast integration

Environment

Host OS Windows 10

HA Virtual Machine - - VirtualBox Operating System HassOS 4.17 HA Version 118.3

https://www.home-assistant.io/integrations/cast/

Problem-relevant configuration.yaml

None - Adding integration though UI.

Traceback/Error logs

No errors in Supervisor or Core logs shown for Cast. I do see one for supervisor.api.ingress, unknown if it has a part to play in the issue. 20-11-25 20:08:34 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None

Additional information

Home Assistant Virtual Machine and Google devices all on the same IoT Vlan. Google cast integration can’t find the google devices, half dozen of them some minis, a hub, and a few CC Audios.

WIn 10 PC hosting the VM is dual networked to both the Iot Vlan and the Trusted Vlan. HA VM is bridged through the IoT Vlan network link, and it has its own IP on the Vlan.

My phone, Laptop, and PCs, on the trusted network can see and interact with the casts which are in the IoT VLAN

So mDNS on my ubiquity gear is working, IGMP snooping on the IoT VLAN is now disabled. it was enabled. no change either way.

I have not attempted manually adding them as of yet.

probot-home-assistant[bot] commented 3 years ago

cast documentation cast source (message by IssueLinks)

emontnemery commented 3 years ago

@N5A The reason it doesn't work is with a very very high likelyhood that mDNS packets are not being forwarded between the network your cast devices are on and the virtual machine. HA's cast integration relies on mDNS packets correctly flowing both from HA to the cast devices (for mDNS queries) and from the cast devices to HA (for mDNS replies).

Please note that mDNS relies on multicast UDP, hence HA and the cast devices must be on the same subnet. If this is not possible, you can set up mDNS forwarding between subnets.

You can find a bit of troubleshooting tips here: https://www.home-assistant.io/integrations/cast#docker-and-cast-devices-and-home-assistant-on-different-subnets https://www.home-assistant.io/integrations/discovery/#mdns-and-upnp

N5A commented 3 years ago

As noted above they are on the same Vlan. Casts and HA are within the same IP range, no VLAN or Subnet traversal issues.

N5A commented 3 years ago

also as noted above, the casts can be seen in other vlans, so mDNS is up and working.

emontnemery commented 3 years ago

Ok, can you confirm the host can discover the cast devices as well (you mention they can be discovered by other PCs, not sure if that includes the host)?

Also, using wireshark or similar tool, check that you can see mDNS packets originating from the container, for example on a laptop connected to same WiFi as one of the casts.

N5A commented 3 years ago

Yes the host PC can see them, it can continue to see them when I pull the secure lan network connection, leaving it only connected to the IoT lan where the casts reside.

N5A commented 3 years ago

I've never used wireshark or similar tools.

Joikast commented 3 years ago

Hi,

Seems like I am having the same or similar issue. Google cast devices not detected.

Host OS
vmware ESXi 6.5 
Home Assistant deployed through .ova
IP HA: 192.168.137.103
IP Google home mini: 192.168.137.60
System Health
Home Assistant Core Integration
version: 0.118.3
installation_type: Home Assistant OS
dev: false
hassio: true
docker: true
virtualenv: false
python_version: 3.8.6
os_name: Linux
os_version: 5.4.77
arch: x86_64
timezone: Europe/Stockholm
Hass.io
host_os: HassOS 4.17
update_channel: stable
supervisor_version: 2020.11.0
docker_version: 19.03.12
disk_total: 43.6 GB
disk_used: 7.5 GB
healthy: true
supported: true
board: ova
supervisor_api: ok
version_api: ok

Google cast device: google home mini (1st gen)

Both devices on same subnet, and they are able to ping each other.

I have done wireshark capture on the trunk port towards the esxi, but I cannot make much out of that information, hopefully someone else can 👍 ( I have filtered out all internal traffic except the google home mini and the HA host).

I also ran the following debugs:

logger:
 default: info
 logs:
    homeassistant.components.cast: debug
    homeassistant.components.cast.media_player: debug
    pychromecast: debug
    pychromecast.discovery: debug
    pychromecast.socket_client: debug
    zeroconf: debug

Normal HA log does not say anything.

google cast capture.zip debug google cast.txt

Any help would be greatly appriciated :)

emontnemery commented 3 years ago

@Joikast, @n5a Did it work with a previous version of HA?

emontnemery commented 3 years ago

@N5A It's clear from the log debug google cast.txt that pyzeroconf sends queries for cast devices, but it doesn't see any incoming mDNS data (except for its own packets). Example of query: [zeroconf] Sending (40 bytes #1) <DNSOutgoing:{multicast=True, flags=0, questions=[question[ptr,in,_googlecast._tcp.local.]], answers=[], authorities=[], additionals=[]}> For replies, look for [zeroconf] Received from in the log, in each case it's just it's own question.

The wireshark log however includes both queries originating from HA and replies from the cast device, for example:

59  0.863011    192.168.137.103 5353    224.0.0.251 5353    MDNS    82  Standard query 0x0000 PTR _googlecast._tcp.local, "QM" question
60  0.863686    192.168.137.103 5353    224.0.0.251 5353    MDNS    82  Standard query 0x0000 PTR _googlecast._tcp.local, "QM" question
61  0.864928    192.168.137.103 5353    224.0.0.251 5353    MDNS    82  Standard query 0x0000 PTR _googlecast._tcp.local, "QM" question
62  0.867760    192.168.137.60  5353    224.0.0.251 5353    MDNS    400 Standard query response 0x0000 PTR Google-Home-Mini-35c4bb9c93798e52d435d091cf11b9b7._googlecast._tcp.local TXT, cache flush SRV, cache flush 0 0 8009 35c4bb9c-9379-8e52-d435-d091cf11b9b7.local A, cache flush 192.168.137.60
63  0.867951    192.168.137.60  5353    224.0.0.251 5353    MDNS    400 Standard query response 0x0000 PTR Google-Home-Mini-35c4bb9c93798e52d435d091cf11b9b7._googlecast._tcp.local TXT, cache flush SRV, cache flush 0 0 8009 35c4bb9c-9379-8e52-d435-d091cf11b9b7.local A, cache flush 192.168.137.60
65  0.868054    192.168.137.60  5353    224.0.0.251 5353    MDNS    400 Standard query response 0x0000 PTR Google-Home-Mini-35c4bb9c93798e52d435d091cf11b9b7._googlecast._tcp.local TXT, cache flush SRV, cache flush 0 0 8009 35c4bb9c-9379-8e52-d435-d091cf11b9b7.local A, cache flush 192.168.137.60

I believe this means pyzeroconf succesfully sends mDNS queries which are received by the cast. However, the answers from the cast are not received by pyzeronconf.

Would it be possible to dump packets from within the container as well?

Joikast commented 3 years ago

First of all, thanks a lot for the fast replies!

@Joikast, @N5A Did it work with a previous version of HA?

I have never got it to work, first time I tried was only a couple of weeks ago.

Would it be possible to dump packets from within the container as well?

Let me see if I can manage to do that..

Is there any direct communication (unicast) between the Google cast device and the HA that you expect to see?

emontnemery commented 3 years ago

The Cast integration relies on working mDNS to discover the casts, direct communication will be initiated only after that is successful.

You could try searching for hints or tips on setting up mDNS / zeroconf / avahi / bonjour (all pretty much different names for the same thing) in vmware containers for your setup. It's a common problem to have mDNS packets dropped, see for instance: https://www.reddit.com/r/homeassistant/comments/e9som8/installed_hassio_in_vmware_no_zeroconfbonjourmdns/

Note that the tip in the Reddit thread about manually specifying IP of the casts is no longer supported as it doesn't work for audio groups which can not be tied to an IP.

Joikast commented 3 years ago

This might be a silly question, but I am not very familiar with containers.

When i ran tcpdump inside the container(?), I see a lot of stuff happening, but it makes no sense to me. I only see local addresses of 172.30.32.0/23 subnet, I guess this is normal.

There must be some NAT going on? Can I track the traffic from the local LAN? Is it possible to access the VM and run a tcpdump there maybe?

emontnemery commented 3 years ago

I think you can run tcpdump and simply filter on destination port 5353 which is the MDNS port. If you run it in parallel on the trunk port as you did before and in the container, it should be possible to confirm if packets are dropped.

N5A commented 3 years ago

@Joikast, @N5A Did it work with a previous version of HA?

Its always been unstable. Working and not working at complete random with no config changes on my side at all, from no adding zwave devices to updating HA versions. i've not been able to pin down a change. So i've been wondering if its Google mucking about with things but its to often for even that so i'm back to looking at local causes.

I have no access to the container as its locked away inside the VM. if you knwo of a way into it so I can check things for ya lemme know. I'll poke around with WS from untagged port to see what data flow I see in a bit, stuck doing some stuff for work today.

emontnemery commented 3 years ago

@n5a I don't think this is on Google, it's most likely a problem with mDNS not routing. Have you followed the instructions here for how to configure the virtual machine: https://www.home-assistant.io/hassio/installation/

Since you're on Windows, could you try creating a Hyper-V VM and see if that works?

For reference, my Hyper-V networking setup: image

N5A commented 3 years ago

Yes its in bridged, promiscuous/allow all

I had to reboot the VM as HASS failed to respond to any commands. I noticed during its boot it had a line about mDNS starting, and also device eth0 entering promiscuous mode.

No, I cannot convert the Vbox system over to a HyperV. no converters exist. I looked when I was going to put PiHole in a docker on my windows minipc.. but that would end up enabling HyperV, preventing all other hypervisors from working and having a to go through hell to get it yanked back out.

N5A commented 3 years ago

I got wire shark up on a untagged port.

N5A commented 3 years ago

how can i set a filter besides MDNS to limit it to the IOT network IP range? I imagine you don't care about the hey ai heard you responses from the rest of the IP subnets that aren't involved.

N5A commented 3 years ago

alright, I fog the filter, simple enough with a little digging.. now to save the filtered capture.

N5A commented 3 years ago

n5apackets.txt

emontnemery commented 3 years ago

@n5a I meant that you could try to simply create a new Hyper-V virtual machine with a fresh image to see if mDNS works in that case to try to narrow down the problem a bit.

N5A commented 3 years ago

n5apackets.zip

lets try that... the text file looks useless.

N5A commented 3 years ago

36 is my Hass's IP final ocet

emontnemery commented 3 years ago

Alright, both the txt and zip contain just a single MDNS response packet from a Google Cast device.

For how long did you capture packets?

N5A commented 3 years ago

N5APackets2.zip

Apparently even though It showed 1300 packets in the save option it only saved the one that was highlighted. this should do it, all packets now show on my pc im posting from, separate from the PC im running hass on with a untagged wired port.

emontnemery commented 3 years ago

@N5A The dump looks fine, there are queries from HA (source adress ending in 36) and what appears to be answers to those queries. Would you mind giving a fresh VM a shot?

MarkHofmann11 commented 3 years ago

Having the same issue here - running HA on Windows in a VM. Worked fine when using the static IP of the Google Mini, but since support for that seems to have been removed, it no longer works reliably. Is there a reason why we can't define the cast devices statically?

N5A commented 3 years ago

I should be able to do a fresh VM sometime this week, Was busy with stuff for work all weekend.

emontnemery commented 3 years ago

@MarkHofmann11 The static IP support was removed because:

Are you also using Virtualbox? Would you mind trying setting up a fresh Hyper-V VM?

MarkHofmann11 commented 3 years ago

@emontnemery Good news - as I think I figured out what is causing the issue (at least in my case). If any of the Google Cast devices are "offline" or not active, it will hang communication to the rest of the devices. I noticed that a few of my nVidia Shields were showing as "offline" (via my nVidia shield app), and woke them up. Right after I did that, all the cast devices showed available. So I think the issue is if any device is offline, the integration doesn't skip it and go on to the next one. It gets hung up causing all the rest to show unavailable. (I have a combination of nVidia Shields and Google Minis that should up via cast.)

MarkHofmann11 commented 3 years ago

The other thing that might be unique for me is I use ADB for the nVidia shields, and have multiple media_player entities for the same device (once via cast and one via ADB). The first is detected via ADB and the 2nd is via cast.

media_player.shield_bedroom_tv off adb_response: null hdmi_input: null friendly_name: Shield Bedroom TV supported_features: 23997 icon: mdi:shield-plus
media_player.shield_bedroom_tv_2 off friendly_name: Shield-Bedroom-TV supported_features: 21389
emontnemery commented 3 years ago

I noticed that a few of my nVidia Shields were showing as "offline" (via my nVidia shield app), and woke them up. Right after I did that, all the cast devices showed available.

That's quite interesting, can you try to reproduce this scenario with these logs enabled, please note the times when you woke one of the nvidia devices up.

logger:
 default: info
 logs:
    homeassistant.components.cast: debug
    homeassistant.components.cast.media_player: debug
    pychromecast: debug
    pychromecast.discovery: debug
    pychromecast.socket_client: debug
    zeroconf: debug
MarkHofmann11 commented 3 years ago

Around 19:03 is when the Google mini's (via cast) appeared as available again. All I did was ensure all my nVidia shields were online via the app, and restarted my ADB server. Then right away, the Google Mini's appeared available via cast. The log with all the debug is attached. cast.txt

MarkHofmann11 commented 3 years ago

Just a thought, but I'm wondering if the mDNS cast discovery is trying to attach the shields to the ADB entity vs. the "_2" cast entity. Once I restart the ADB server and ensure the shields are awake, it immediately works. I use the ADB functions to get extra features on the shields in HA (for controlling them). Since both the ADB and cast are media players and both one the same shield devices, that seems to be the common item that hangs things up and makes my Google Mini's show unavailable until I do those steps.

Joikast commented 3 years ago

Actually, it just started working. I rebooted my HA vm after upgrading the core. I also did some changes in VMware previously where I enabled promiscuous mode on the NIC. Maybe a combination of these things that got it working.

Thanks a lot for the support.

emontnemery commented 3 years ago

@Joikast can you confirm the exact settings you use for the VMs networking now, and if changing back (disabling promiscuous mode?) breaks the cast integration?

Joikast commented 3 years ago

@emontnemery sure, here are my current settings:

Promiscuous mode on the VLAN enabled: image

Promiscuous mode on the vSwitch enabled: image

I removed the google cast integration and reloaded my HA, and added the integration again. It works.

I then disabled promiscuous mode again on Vswitch0 and the VLAN: image

image

Again, I removed the integration and rebooted the HA vm. Google cast integration still works.

So, currently promiscuous mode is disabled (not allowed) in vmware, and google cast integration still works fine after reboot.

MarkHofmann11 commented 3 years ago

@emontnemery In order to prevent the androidtv and Google Cast media_player.* discoveries from clobbering each other, do you think I should try hard coding the uuid under the Google Cast setup for my Google Minis - so the cast component stops discovering my nVidia shields (since they are added via the AndroidTV component)?

MarkHofmann11 commented 3 years ago

@emontnemery Just tried it, and had no effect. Just now I couldn't get the cast entities to appear. Tried all my normal ways to get it to display, nothing. Just waited and after around 10 minutes, they just showed up again. Very strange..

MarkHofmann11 commented 3 years ago

Did some more testing this evening. Tried deleting the component and re-adding. It wouldn't let me re-add the component after I deleted it - because it was reporting "No Devices Found" and would exit. Decided to do the manual setup via UUID to get the component loaded and active. Even with the UUIDs statically defined, it took at least 5-10 minutes for the media_player cast entities to show up. Not sure why there is this long delay. Maybe something unique to us using Windows? I'm also using WIN10 as a VM here.

emontnemery commented 3 years ago

Even with the UUIDs statically defined, it took at least 5-10 minutes for the media_player cast entities to show up. Not sure why there is this long delay

Manually adding the casts still depends on working MDNS, you can think of it as whitelisting wanted devices. The long delay could maybe indicate that the container can receive MDNS answers, but not send. In any case, if the container can't send and receive MDNS packets, it won't work. Would you mind trying with a fresh VM in Hyper-V, just to try to isolate the problem?

I've not had a chance to look at your log yet, will try to do it tomorrow.

MarkHofmann11 commented 3 years ago

I'm using VMware ESXi 5.5 and Win10 VM for HA. I made a few changes to the vDS in ESXi - enabled "Promiscuous mode, MAC address changes, and Forgedtransmits". That should allow all packets to forward to the port group (it was previously rejected/blocked).

Will do some more testing tomorrow and see if these changes make the discovery of the cast devices not take so long. So far, I have seen the discovery take anywhere from 2 minutes to 30 minutes for the devices. Even then some don't display and I have to wake them up or do something with them before they appear.

Update: Just tried with those settings enabled and same behavior (no change). Still takes along time to discover the cast devices after re-starting.

N5A commented 3 years ago

Like others.. mystically now it works. I made no changes since my last post with trouble, however I did not have time earlier to check it when I said I'd do a new VM during the week.. checked here saw others staying they started working.. Checked mine, sure enough Discovery said it found new devices... The only thing I did do is upgrade node red. Unlikely that should have anything to do with but its the only action I did before HA found them.

MarkHofmann11 commented 3 years ago

The only way it seems I can get the cast devices to appear is by hard coding the UUID (since I can't add the component - it fails to discover anything and exits). If I exit HA under Windows and restart, it almost never re-discovers. I have to reboot the VM and re-run HA and then it appears after some delay with no rhyme or reason. It is still a mystery to me at this point. Just removed the apostrophes from the google mini names just in case that was causing some type of issue.

MarkHofmann11 commented 3 years ago

I might have an idea on what could be causing the issue. I have loads of these in my log before things just start working on their own.

The first IP is a TAP driver for VPN client. The others are the "real IP" and "loopback" for the VM. It appears zeroconf is ignoring the packets on the "real NICs" because it is seeing it on the TAP driver. Since the TAP isn't part of HA, it goes to the bit bucket.

Going to experiment with removing the TAP driver - but I either way, still need a good solution for this as I'm sure there are others that have NICs not bound to HA in their VM or hardware. If there was an option to "NOT ignore duplicate packets", that would probably fix it:

2020-11-30 19:01:45 DEBUG (zeroconf-Engine-2896) [zeroconf] Received from '169.254.148.203':5353 (socket 1460): <DNSIncoming:{id=0, flags=0, n_q=1, n_ans=0, n_auth=0, n_add=0, questions=[question[ptr,in,_arduino._tcp.local.]], answers=[]}> (37 bytes) as [b'\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x08_arduino\x04_tcp\x05local\x00\x00\x0c\x00\x01'] 2020-11-30 19:01:45 DEBUG (zeroconf-Engine-2896) [zeroconf] Ignoring duplicate message received from '192.168.0.14':5353 (socket 1460) (37 bytes) as [b'\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x08_arduino\x04_tcp\x05local\x00\x00\x0c\x00\x01'] 2020-11-30 19:01:45 DEBUG (zeroconf-Engine-2896) [zeroconf] Ignoring duplicate message received from '127.0.0.1':5353 (socket 1460) (37 bytes) as [b'\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x08_arduino\x04_tcp\x05local\x00\x00\x0c\x00\x01']

emontnemery commented 3 years ago

@MarkHofmann11 The message you quote is a message sent by HA, trying to locate any devices of type _arduino._tcp.local.. Note that the packet has questions but no answers (answers=[]).

There is a feature in zeroconf to discard duplicates but it should only discard true duplicate messages and this should not have any negative side effects. The feature is however relatively new and might well not be bullet proof. If you can find a message with answers which is not same as an accepted message, that would be interesting.

In the log you attached a couple of days ago there are no incoming packages with actual answers until around 19.02 when you presumably poked the nvidia devices. Hence, there is no problem due to multiple media players etc., there simply were not any incoming MDNS data for HA to act on. Did you perhaps do something else to poke the nVidia devices? The answers don't seem to come frome the devices themselves but from 192.168.0.66, do you know which device that is?

MarkHofmann11 commented 3 years ago

192.168.0.66 is my android cell phone - which is what I used to run the nVidia Shield app + Google Home app to make sure I could see all the devices via my phone. All showed up right away on my phone. I'm wondering if zeroconfig in HA sent the original discovery out on the 169.254.148.203 (bogus TAP NIC driver) IP which would go to nothing. It was listening on all the IPs on the VM, but I can't tell where the actual discovery/send was sent.

emontnemery commented 3 years ago

@MarkHofmann11 OK, so what happened at 19.02-19.03 is that your cellphone sent the MDNS answers which were picked up by Home Assistant. I think this could indicate that in your case HA can receive - but not successfully send - MDNS packets.

Zeroconf should by default attempt to send packets on all interfaces. I think this is working correctly in your case since every sent packet is again received from 192.168.0.14 which I guess is the LAN address of your VM?

One potential issue could be that your WiFi router or AP does not forward MDNS (multicast UDP) from the wired interface to the wireless interface. You could check in the router settings if it has any settings related to multicast UDP, MDNS, Bonjour, Avahi or UPNP. Try rebooting the router / AP also.

MarkHofmann11 commented 3 years ago

@emontnemery Thank you!!!! That was exactly what was going on (WiFi APs not forwarding from the wired interface to the wireless interface). I have (5) Cisco autonomous APs and here is the command line options to fix this behavior (on the APs):

no ip igmp snooping no dot11 igmp snooping-helper

After updated my APs with those statements, I restarted HA and it immediately saw all my Cast devices (instantly). Thanks again!!!