postlund / pyatv

A client library for Apple TV and AirPlay devices
https://pyatv.dev
MIT License
836 stars 89 forks source link

Failed to find device with identifier #292

Closed Veldkornet closed 4 years ago

Veldkornet commented 4 years ago

Describe the bug After initially setting up the AppleTV in HomeAssistant 0.102.2 in Hass.io, you restart HomeAssistant, and the AppleTV no longer works. 2019-11-27 22:57:08 ERROR (MainThread) [custom_components.apple_tv] Failed to find device with identifier xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx The only way to make it work again is to remove the AppleTv and re-add it again.

To Reproduce

Expected behavior

System Setup (please complete the following information):

Additional context

postlund commented 4 years ago

Ok, so what is happening is that pyatv scans for the device in order to connect to it but does not find. This happens sometimes and it should be sufficient to restart Home Assistant to make it work again (assuming the device is found then, otherwise restart again...). Could you try that next time it happens, just to see if it works?

It is not good that the device isn't found, but it can happen. As long as it isn't happening all the time. Anyways, this will be "solved" when #273 (via #287) is in place and integrated into the component. This is the same case where the connection is lost, e.g. due to a reboot.

Veldkornet commented 4 years ago

I restarted a couple of times, didn’t come back. Only removing the integration and adding it back sorts it out.

There’s nothing special about the setup, the AppleTV is connected via cable and on the same subnet as HomeAssistant.

During the initial setup, I need to enter the IP address of the AppleTV (so if there is supposed to be some type of device scanner for that part, that doesn’t find it either).

postlund commented 4 years ago

I restarted a couple of times, didn’t come back. Only removing the integration and adding it back sorts it out.

Hmm, that seems strange to me. It would be good if you enable debug logs and provide them to me:

logger:
  logs:
    pyatv: debug

There’s nothing special about the setup, the AppleTV is connected via cable and on the same subnet as HomeAssistant.

During the initial setup, I need to enter the IP address of the AppleTV (so if there is supposed to be some type of device scanner for that part, that doesn’t find it either).

Filtering is performed on top of the scan result. So it doesn't matter if you specify an IP-address, name or an identifier. A scan for all available devices is done first and picking the correct one is done from the result. A device identifier is stored in the configuration even if you used an IP-address or device name, that is why you see an identifier in the log. I can't really see why the device is found when adding it but not afterwards. It's like if the identifier is not stored correctly, so filtering isn't working as it's supposed to.

Also, discovery isn't supported for custom components at the moment. That is why the device isn't automatically found, otherwise it would.

Veldkornet commented 4 years ago

There's not really much to it :)

2019-11-28 12:49:03 DEBUG (MainThread) [pyatv] Discovering devices for 5 seconds
2019-11-28 12:49:10 ERROR (MainThread) [custom_components.apple_tv] Failed to find device with identifier xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
postlund commented 4 years ago

Interesting, it doesn't find any device at all. But it works every time you add a new device?

Veldkornet commented 4 years ago

Yup, so now after the reboot, I basically just remove the integration and add it again. Then it's happy again (until the next reboot).

postlund commented 4 years ago

That is so weird. Could you possibly try running atvremote --debug scan manually and see what happens?

Veldkornet commented 4 years ago

Seems okay?

bash-5.0# atvremote --debug scan
DEBUG: Discovering devices for 3 seconds
DEBUG: Auto-discovered LivingRoom at 192.168.1.14:49153 (protocol: 2)
DEBUG: Auto-discovered LivingRoom at 192.168.1.14:7000 (protocol: 3)
DEBUG: Auto-discovered Sonos at 192.168.1.40:7000 (protocol: 3)
Scan Results
========================================
       Name: LivingRoom 
    Address: 192.168.1.14
Identifiers:
 - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
 - XX:XX:XX:XX:XX:XX
Services:
 - Protocol: MRP, Port: 49153, Credentials: None
 - Protocol: AirPlay, Port: 7000, Credentials: None

       Name: Sonos
    Address: 192.168.1.40
Identifiers:
 - XX:XX:XX:XX:XX:XX
Services:
 - Protocol: AirPlay, Port: 7000, Credentials: None
postlund commented 4 years ago

Yeah, that looks as it should. And I assume that there were no "Auto-discovered"-lines in the Home Assistant log?

Veldkornet commented 4 years ago

Yes, there is. BUT, it has the timestamps from when I added the device manually.

So the auto-discover during the pair works, but during boot not. Or so it seems.

2019-11-28 14:11:13 DEBUG (MainThread) [pyatv] Discovering devices for 3 seconds
2019-11-28 14:11:14 DEBUG (MainThread) [pyatv] Auto-discovered Sonos at 192.168.1.40:7000 (protocol: 3)
2019-11-28 14:11:14 DEBUG (MainThread) [pyatv] Auto-discovered Woonkamer at 192.168.1.14:49153 (protocol: 2)
2019-11-28 14:11:14 DEBUG (MainThread) [pyatv] Auto-discovered Woonkamer at 192.168.1.14:7000 (protocol: 3)
2019-11-28 14:11:20 DEBUG (MainThread) [pyatv.mrp.connection] Connected to device

After this there's lots of logs of it configuring the new integration.

postlund commented 4 years ago

This is so weird. If you have debug logs enabled, the device added (but got the initial error so the device isn't discovered) and try to add a new device, does the scan find any devices then? You can just write anything into the add-dialog to trigger the scan.

Veldkornet commented 4 years ago

Okay, so now I rebooted. Device now missing.

Now instead of removing it and adding it, I did the following:

So still, the only solution is to remove it and re-add it

postlund commented 4 years ago

I cannot really explain this. The timeout is even longer when setting up the config entry (5 sec) compared to when adding a new device (3 sec). The component only try once, so if it fails then it won't recover. Maybe it will work better (and at least recover after a while) when I add support for reconnecting. Hmm, this is a really strange problem...

nikolaykasyanov commented 4 years ago

I wonder if it has something to do with timing, what if the initial scan happens too early in specific circumstances? Although as you mention trying to add a bogus device on Integrations page should've helped in this case.

@Veldkornet what do you think about sharing some details about our setups in order to find what's common? I'll start:

Veldkornet commented 4 years ago

I have Hass.io running on a RaspberryPi 4 with HA version 0.102.2. RaspberryPi is connected via WiFi Apple TV is connected via cable. Both devices are on the same VLAN. Network is Ubiquiti UniFi.

With Hass.io though, if I’m not mistaken, the device itself stays connected to the network during a HomeAssistant restart. So the Docker container should have network access right from the start.

nikolaykasyanov commented 4 years ago

Another data point: my local dev HASS installation running on Mac via venv is not affected by this issue.

@Veldkornet I also added info about VLAN

postlund commented 4 years ago

I wonder if it has something to do with timing, what if the initial scan happens too early in specific circumstances? Although as you mention trying to add a bogus device on Integrations page should've helped in this case.

@Veldkornet what do you think about sharing some details about our setups in order to find what's common? I'll start:

  • HASS runs in Docker on Raspberry Pi 3, homeassistant/raspberrypi3-homeassistant:0.102.1
  • Both Pi and Apple TV 4K are wired to a Ubiquiti EdgeRouter X router

As long as Home Assistant has network access, it should work. It must have otherwise none of the other integrations would work. You both seem to be running in docker (via hass.io or just plain docker). I haven't tested it in docker myself yet, just run in plain. So I can try that tomorrow or so and see if I can reproduce. It would be a lot easier to troubleshoot if I can reproduce it myself.

glassbase commented 4 years ago

This started for me too randomly. I was actually able to restart HA with no issues up until just a few minutes ago when I restarted. I don't reboot often but a couple times a week usually for config changes.

Edit - for clarity

postlund commented 4 years ago

This started for me too randomly. I was actually able to restart HA with no issues up until just a few minutes ago when I restarted. I don't reboot often but a couple times a week usually for config changes.

Edit - for clarity

Are you running docker/Hass.io?

glassbase commented 4 years ago

@postlund Docker

postlund commented 4 years ago

@postlund Docker

Ok, docker still seems to be a common culprit here.

aidanblack commented 4 years ago

I just managed to install the beta. I have a manual Hass install on a RPi3B+ in a Python virtual environment on Raspbian and it failed to find the Apple TV after a restart. So a non-Docker example here :)

geekofweek commented 4 years ago

I ran into the same issue, also running it via Docker.

postlund commented 4 years ago

Ok, so it seems to be a general issue then (maybe most people run in docker and that is why it's reported more often there). I don't really have any leads on this right now, if someone wants to troubleshoot and have any ideas I'm all ears. Hopefully my updates to the HA component at least will make it recover at some point. It doesn't solve the problem if this happens when running atvremote though. I'm closing in on finishing the first version with re-connect support and I hope to be able to push it during the dat, but I don't make any promises.

nikolaykasyanov commented 4 years ago

I looks like adding a 5 second delay at the beginning of async_setup_entry helps:

@@ -28,6 +28,9 @@ async def async_setup(hass, config):
 async def async_setup_entry(hass, entry):
     """Set up a config entry for Apple TV."""
     import pyatv
+    import asyncio
+
+    await asyncio.sleep(5)

     identifier = entry.data[CONF_IDENTIFIER]
     protocol = entry.data[CONF_PROTOCOL]

Is it possible to make an educated why it helps though? I.e. what HASS parts or subsystems can be not ready yet?

UPD: increasing pyatv.scan timeout also helps.

postlund commented 4 years ago

I looks like adding a 5 second delay at the beginning of async_setup_entry helps:

@@ -28,6 +28,9 @@ async def async_setup(hass, config):
 async def async_setup_entry(hass, entry):
     """Set up a config entry for Apple TV."""
     import pyatv
+    import asyncio
+
+    await asyncio.sleep(5)

     identifier = entry.data[CONF_IDENTIFIER]
     protocol = entry.data[CONF_PROTOCOL]

Is it possible to make an educated why it helps though? I.e. what HASS parts or subsystems can be not ready yet?

UPD: increasing pyatv.scan timeout also helps.

Ok, maybe a longer period of time is needed here for some reason. Seems strange but I can buy if it works!

stcbus commented 4 years ago

I am getting this issue as well. Never have had it work for me yet. Also on Docker.

(Also going to report another issue about airplay codes with leading 0s not working).

postlund commented 4 years ago

@nikolaykasyanov Did you just add timeout=5 to scan in __init__.py and it worked? Because 5s is default in pyatv, so I dont' understand how that would change anything.

postlund commented 4 years ago

(Also going to report another issue about airplay codes with leading 0s not working).

Right, probably the same thing as I fixed for MRP in #291.

Veldkornet commented 4 years ago

My AppleTV seems to have resurrected itself after installing 2f16e2f (missing after reboot initially, installed the update, rebooted, AppleTV was picked up again during the startup). Still need to check if it survives additional reboots though...

nikolaykasyanov commented 4 years ago

@postlund yes, that's what I did. That bothers me as well. I have two uneducated guesses:

postlund commented 4 years ago

@postlund yes, that's what I did. That bothers me as well. I have two uneducated guesses:

* these five second delay gives something inside HASS finish initalizing

* these five second just let us skip the most CPU-intense part of HASS launch, making the subsequent scan faster.

I mean, writing this:

atvs = await scan(loop, timeout=5)

or

atvs = await scan(loop)

Should result in the exact same thing as default value for timeout is 5. This is mainly what I'm asking if you did and it worked?

Adding a delay before is of course another matter, as that changes timing a bit. In theory it could mean that mDNS responses from Home Assistants zeroconf implementation are picked up (since it's multicast traffic). But I still don't understand why that would matter. I want to know the root cause, just adding s sleep for some random time is not something I'm gonna do as it's very fragile. Maybe it works now for some people, but the problem likely still exists and will haunt other users.

I assume that it recovers with my latest update to the component?

nikolaykasyanov commented 4 years ago

@postlund sorry for confusion, the timeout I used was 10, I meant that I added 5 to the default 5.

nikolaykasyanov commented 4 years ago

@postlund I installed the latest updated yesterday & restarted hass, it didn't recover overnight, unfortunately.

nikolaykasyanov commented 4 years ago

@postlund I've been looking through HASS code to check how other components use zeroconf and stumbled upon the Chromecast component. It seems like its backend library, pychromecast, is using another zeroconf implementation, not the one pyatv is using. As far as I can see that library doesn't support asyncio though. Could the library be the culprit?

Veldkornet commented 4 years ago

Seems to be working for me now, version I checked with was 5f64b52. I believe with this version I removed the integration and re-added it as per the release notes.

I restarted HA, it seemed to turn on the AppleTV after reboot and then turn it off again. I see it in my device list, all looks good.

With 5d0488c it also seems to be working fine (this time it didn’t turn the AppleTv on and off during boot).

postlund commented 4 years ago

@postlund I've been looking through HASS code to check how other components use zeroconf and stumbled upon the Chromecast component. It seems like its backend library, pychromecast, is using another zeroconf implementation, not the one pyatv is using. As far as I can see that library doesn't support asyncio though. Could the library be the culprit?

Home Assistant uses python-zeroconf internally for zeroconf (the one mentioned in pychromecast). I used that library as well until I couple of weeks ago, where I concluded that aiozeroconf worked a lot better for the users involved in the investigation. See issue for #214 for details. I know that there were discussion of switching to aiozeroconf in home assistant as well (since it's natively asyncio), but there were some issue left. Looking at the issues in the projects GitHub page will reveal some interesting read. Also, all zeroconf implementations in python are forks of the same project. So at their cores, they are more or less the same thing.

Nevertheless, I do believe we will continue see problems like these going forward. It is not the correct way to use zeroconf like this. It is much better to have a daemon (like Anahi) that is always running, maintaining a cache of all discovered services. Applications can then just ask that daemon and get a service list. If a service is not available at one time, it will probably be later. I see missing and delayed discoveries of services all the time when using AirPlay (in iOS). It usually takes some time until devices appear, so not even the "golden standard" of implementations is foolproof. The important part is to have mitigations. The re-connect support should improve the situation, but I hope #293 will make it even better.

postlund commented 4 years ago

@postlund sorry for confusion, the timeout I used was 10, I meant that I added 5 to the default 5.

Right, so 10s scan time works but not 5s? I can add an option to support "extended scan time" that can be enabled as a workaround. It will probably be relevant to more people.

nikolaykasyanov commented 4 years ago

@postlund thanks for the detailed elaboration.

postlund commented 4 years ago

I had some issues scanning yesterday. It was consistent several times: no results at all, despite working as expected prior to that. Took a tcpdump and enabled debug in aiozerconf. I could not see the expected answers in aiozeroconf but I could see MDNS answers from the Apple TV in my pcap-file from tcpdump. Started googling and found this:

https://github.com/python/asyncio/issues/480

which leads to this:

https://github.com/python/cpython/pull/13162

So I'm starting to suspect that this is a due to a bug in asyncio. Might be interesting to see if that patch fixes the problem.

@nikolaykasyanov Since you seem to consistently have this problem, maybe you can help out here (or anyone else for that matter, e.g. @Veldkornet)? My suggestion is to temporaily just modify asyncio/base_events.py in your python installation according to the patch:

https://github.com/python/cpython/pull/13162/files#diff-08afa52ab2b1511bee8527814ad44d80

Ignore everything else, just focus on base_events.py. Then retry again and see if it works. You should revert the change after testing, unless you feel brave.

nikolaykasyanov commented 4 years ago

@postlund interesting!

My installation has two such files:

$ docker-compose exec hass find / -name 'base_events.py'
/usr/local/lib/python3.7/asyncio/base_events.py
/usr/local/lib/python3.7/site-packages/asyncio/base_events.py

The former seem to already have this check (if not allow_broadcast:), the latter one seems to be very old, it uses yield instead of await and has fewer lines.

postlund commented 4 years ago

@postlund interesting!

My installation has two such files:

$ docker-compose exec hass find / -name 'base_events.py'
/usr/local/lib/python3.7/asyncio/base_events.py
/usr/local/lib/python3.7/site-packages/asyncio/base_events.py

The former seem to already have this check (if not allow_broadcast:), the latter one seems to be very old, it uses yield instead of await and has fewer lines.

Darn it, I was hoping this was something. Which docker image are you using?

nikolaykasyanov commented 4 years ago

@postlund I’m using this one: https://hub.docker.com/r/homeassistant/raspberrypi3-homeassistant

jayfbeast commented 4 years ago

Hey guys!

Yesterday I've been struck by an idea of what would be the probable solution and it turns out it worked for me.

I'm running Home-Assistant on Docker from within an Intel NUC. I've never been able to find the Apple TV while I was able to detect it manually from outside the Docker container.

If you are running several Docker containers on the same hardware, chances are that you actually reserved the port 8123 for HA usage. If you did, then the command line in order to create the Docker container would have the -p 8123:8123 command in it.

I figured out that would be the problem as the only to communicate with the container is by going through the port 8123. Therefore, only having 8123 as a communication won't allow any reliable discovery or anything else outside of the Docker network.

I then decided to stop all containers on my Intel NUC, deleted my HA container and recreated it --net=host and voilà, the AppleTV is now being discovered and I can control it remotely no problems!

The actual command I typed in was: sudo docker run -d --name="home-assistant" --net=host --restart=always --device=/dev/ttyACM0 --device=/dev/ttyUSB0 -v /home/jayf/Dockerconf/HomeAssistant:/config -v /var/run/docker.sock:/var/run/docker.sock -e "TZ=America/Montreal" -p 8123:8123 homeassistant/home-assistant

The --device=/dev/ttyACM0 is being used for my Z-Wave USB dongle. Obviously, the specified port in that command will be ignored as --net=host is being used.

Hopefully it will help others!

@postlund Thank you for your hard work and I'll keep investigating in order to get it work without the --net=host command!

nikolaykasyanov commented 4 years ago

@jayfbeast I’ve been using the host mode from the start because of HomeKit.

jayfbeast commented 4 years ago

@nikolaykasyanov then is the ATV in the same subnet than HA ?

postlund commented 4 years ago

Hey guys!

Yesterday I've been struck by an idea of what would be the probable solution and it turns out it worked for me.

I'm running Home-Assistant on Docker from within an Intel NUC. I've never been able to find the Apple TV while I was able to detect it manually from outside the Docker container.

If you are running several Docker containers on the same hardware, chances are that you actually reserved the port 8123 for HA usage. If you did, then the command line in order to create the Docker container would have the -p 8123:8123 command in it.

I figured out that would be the problem as the only to communicate with the container is by going through the port 8123. Therefore, only having 8123 as a communication won't allow any reliable discovery or anything else outside of the Docker network.

I then decided to stop all containers on my Intel NUC, deleted my HA container and recreated it --net=host and voilà, the AppleTV is now being discovered and I can control it remotely no problems!

The actual command I typed in was: sudo docker run -d --name="home-assistant" --net=host --restart=always --device=/dev/ttyACM0 --device=/dev/ttyUSB0 -v /home/jayf/Dockerconf/HomeAssistant:/config -v /var/run/docker.sock:/var/run/docker.sock -e "TZ=America/Montreal" -p 8123:8123 homeassistant/home-assistant

The --device=/dev/ttyACM0 is being used for my Z-Wave USB dongle. Obviously, the specified port in that command will be ignored as --net=host is being used.

Hopefully it will help others!

@postlund Thank you for your hard work and I'll keep investigating in order to get it work without the --net=host command!

I suspected docker too at first, but I run without it and see the issue.

postlund commented 4 years ago

Hey guys! Yesterday I've been struck by an idea of what would be the probable solution and it turns out it worked for me. I'm running Home-Assistant on Docker from within an Intel NUC. I've never been able to find the Apple TV while I was able to detect it manually from outside the Docker container. If you are running several Docker containers on the same hardware, chances are that you actually reserved the port 8123 for HA usage. If you did, then the command line in order to create the Docker container would have the -p 8123:8123 command in it. I figured out that would be the problem as the only to communicate with the container is by going through the port 8123. Therefore, only having 8123 as a communication won't allow any reliable discovery or anything else outside of the Docker network. I then decided to stop all containers on my Intel NUC, deleted my HA container and recreated it --net=host and voilà, the AppleTV is now being discovered and I can control it remotely no problems! The actual command I typed in was: sudo docker run -d --name="home-assistant" --net=host --restart=always --device=/dev/ttyACM0 --device=/dev/ttyUSB0 -v /home/jayf/Dockerconf/HomeAssistant:/config -v /var/run/docker.sock:/var/run/docker.sock -e "TZ=America/Montreal" -p 8123:8123 homeassistant/home-assistant The --device=/dev/ttyACM0 is being used for my Z-Wave USB dongle. Obviously, the specified port in that command will be ignored as --net=host is being used. Hopefully it will help others! @postlund Thank you for your hard work and I'll keep investigating in order to get it work without the --net=host command!

I suspected docker too at first, but I run without it and see the issue.

And also, if you want to get mDNS to work my best bet is that you should forward port 5353 to your container (as that is used by mDNS). Multicast support is needed as well, not sure if you have to do anything for that.

jayfbeast commented 4 years ago

Hey guys! Yesterday I've been struck by an idea of what would be the probable solution and it turns out it worked for me. I'm running Home-Assistant on Docker from within an Intel NUC. I've never been able to find the Apple TV while I was able to detect it manually from outside the Docker container. If you are running several Docker containers on the same hardware, chances are that you actually reserved the port 8123 for HA usage. If you did, then the command line in order to create the Docker container would have the -p 8123:8123 command in it. I figured out that would be the problem as the only to communicate with the container is by going through the port 8123. Therefore, only having 8123 as a communication won't allow any reliable discovery or anything else outside of the Docker network. I then decided to stop all containers on my Intel NUC, deleted my HA container and recreated it --net=host and voilà, the AppleTV is now being discovered and I can control it remotely no problems! The actual command I typed in was: sudo docker run -d --name="home-assistant" --net=host --restart=always --device=/dev/ttyACM0 --device=/dev/ttyUSB0 -v /home/jayf/Dockerconf/HomeAssistant:/config -v /var/run/docker.sock:/var/run/docker.sock -e "TZ=America/Montreal" -p 8123:8123 homeassistant/home-assistant The --device=/dev/ttyACM0 is being used for my Z-Wave USB dongle. Obviously, the specified port in that command will be ignored as --net=host is being used. Hopefully it will help others! @postlund Thank you for your hard work and I'll keep investigating in order to get it work without the --net=host command!

I suspected docker too at first, but I run without it and see the issue.

And also, if you want to get mDNS to work my best bet is that you should forward port 5353 to your container (as that is used by mDNS). Multicast support is needed as well, not sure if you have to do anything for that.

How to do manage to replicate the issue ? I really want to help on that issue.

postlund commented 4 years ago

How to do manage to replicate the issue ? I really want to help on that issue.

I'm not sure to be honest, it just happens at seemingless random times. Usually zeroconf stops working at all, at least for atvremote. I have seen this behavior several times when I have Home Assistant running with my component connected to ATV. When running atvremote on the same machine usually return zero result. My best guess is that it has to do with port re-using (since you have to specify that a UDP can be re-used when binding to it), but I haven't investigated that further. It's also hard to tell if it's the same issue.