home-assistant-libs / pychromecast

Library for Python 3 to communicate with the Google Chromecast.
MIT License
2.51k stars 378 forks source link

pychromecast entering spin loop when a cast device reboots [if browser.stop_discovery() was called] #866

Open neilfred opened 4 months ago

neilfred commented 4 months ago

I have a dozen or so cast devices on my network, and I've had problems where overnight a log file fills up its partition. I determined the issue seemed to be coming from pychromecast – I'm running version 13.1.0 on python 3.7.5 (and would prefer not to upgrade my python version at this point if I can avoid it, hence not running pychromecast 14.0.0).

After reading https://github.com/home-assistant/core/issues/37962 I determined that I can repro my issue (not the same as the issue described in that bug) by rebooting one of the cast devices.

When the heartbeat is lost, here's what I get in my log file:

[#Game-Cast(10.1.1.36):8009] Heartbeat timeout, resetting connection
[#Game-Cast(10.1.1.36):8009] Unhandled exception in worker thread, attempting reconnect
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/pychromecast/socket_client.py", line 540, in run
    if self.run_once(timeout=POLL_TIME_BLOCKING) == 1:
  File "/usr/local/lib/python3.7/dist-packages/pychromecast/socket_client.py", line 563, in run_once
    if not self._check_connection():
  File "/usr/local/lib/python3.7/dist-packages/pychromecast/socket_client.py", line 696, in _check_connection
    self.initialize_connection()
  File "/usr/local/lib/python3.7/dist-packages/pychromecast/socket_client.py", line 298, in initialize_connection
    service, self.zconf
  File "/usr/local/lib/python3.7/dist-packages/pychromecast/dial.py", line 38, in get_host_from_service
    service_info = zconf.get_service_info("_googlecast._tcp.local.", service.data)
  File "/usr/local/lib/python3.7/dist-packages/zeroconf/_core.py", line 266, in get_service_info
    if info.request(self, timeout, question_type):
  File "/usr/local/lib/python3.7/dist-packages/zeroconf/_services/info.py", line 774, in request
    assert zc.loop is not None and zc.loop.is_running()
AssertionError
[#Game-Cast(10.1.1.36):8009] Error communicating with socket, resetting connection
[#Game-Cast(10.1.1.36):8009] Unhandled exception in worker thread, attempting reconnect
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/pychromecast/socket_client.py", line 540, in run
    if self.run_once(timeout=POLL_TIME_BLOCKING) == 1:
  File "/usr/local/lib/python3.7/dist-packages/pychromecast/socket_client.py", line 563, in run_once
    if not self._check_connection():
  File "/usr/local/lib/python3.7/dist-packages/pychromecast/socket_client.py", line 696, in _check_connection
    self.initialize_connection()
  File "/usr/local/lib/python3.7/dist-packages/pychromecast/socket_client.py", line 298, in initialize_connection
    service, self.zconf
  File "/usr/local/lib/python3.7/dist-packages/pychromecast/dial.py", line 38, in get_host_from_service
    service_info = zconf.get_service_info("_googlecast._tcp.local.", service.data)
  File "/usr/local/lib/python3.7/dist-packages/zeroconf/_core.py", line 266, in get_service_info
    if info.request(self, timeout, question_type):
  File "/usr/local/lib/python3.7/dist-packages/zeroconf/_services/info.py", line 774, in request
    assert zc.loop is not None and zc.loop.is_running()
AssertionError

It shows up like that, once with "Heartbeat timeout, resetting connection", and then immediately with the same traceback but starting instead with "Error communicating with socket, resetting connection", and then that second error repeats over and over and over again, thousands of times per second until I kill the process. Also note that the errors continue not only while the cast device is rebooting, but even after the cast device comes back up.

Let me know if there are other details that might be helpful.

neilfred commented 4 months ago

Aha, just found this bug which looks like the same thing. But I'm not clear what the fix is. There's a patch in dial.py but that's dated well before 13.1.0 so maybe wasn't actually relevant? Maybe I just need some version of this comment?

neilfred commented 4 months ago

Solved my problem. Turns out I was calling browser.stop_discovery() after my initial setup finished, and I've now figured out that browser.zc.loop.is_running() becomes False after stop_discovery() – so I removed that call, and now overnight when cast devices reboot themselves, I get "Heartbeat timeout, resetting connection" but no AssertionError and no spin loop. 🎉

I'm going to leave this issue open since it still seems like pychromecast should be more robust to its client having this kind of bug, and at least avoid a spin loop. Maybe the assertion can be changed to fail gracefully when the loop isn't running, or maybe something higher up should be checking whether the loop is running before it tries to reestablish the connection...?