vutang50 / homeassistant-pixelblaze

This is a custom component to allow control of Pixelblaze devices in Home Assistant.
MIT License
23 stars 7 forks source link

Pixelblaze integration takes down Home Assistant when Pixelblaze is unresponsive #4

Open masto opened 1 year ago

masto commented 1 year ago

I apologize up front for this not being a great bug report. If it's not immediately helpful, I can spend some time trying to reproduce it when I have a little more time free. Right now I would have to take down my HA installation and it runs the whole house.

What I know: earlier today I discovered my Home Assistant installation was not working. Specifically, connections to the web server would just hang, automations and scenes weren't responding, etc. The machine was still up, Supervisor was running, I could ssh to it and look at logs, etc. On a reboot, it would come back up briefly and then hang again within a minute or two. The log file had messages about various entity updates taking over 10 seconds, connections timing out, etc., so it wasn't obvious where the problem was. By process of elimination, I removed custom_components until I identified pixelblaze as the one which triggers the problem.

During the incident, the only log entries related to this integration I could find look like:

2023-01-11 11:08:32.051 ERROR (MainThread) [custom_components.pixelblaze.light] Failed to update pixelblaze device Basement Counter Decoration@10.1.1.76: Exception: [Errno 110] Operation timed out

It is probably extremely relevant that right now, that PixelBlaze device is in a strange state: it reports itself as alive to discovery, and it serves up its web page, but connections to port 81 for the websocket are immediately reset. This breaks the Pixelblaze client library:

>>> from pixelblaze import *
>>> pb = Pixelblaze("10.1.1.76")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/pixelblaze/pixelblaze.py", line 243, in __init__
    self._open()
  File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/pixelblaze/pixelblaze.py", line 398, in _open
    self.ws = websocket.create_connection(uri, sockopt=((socket.SOL_SOCKET, socket.SO_REUSEADDR, 1), (socket.IPPROTO_TCP, socket.TCP_NODELAY, 1),))
  File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_core.py", line 608, in create_connection
    websock.connect(url, **options)
  File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_core.py", line 253, in connect
    self.handshake_response = handshake(self.sock, url, *addrs, **options)
  File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_handshake.py", line 57, in handshake
    status, resp = _get_resp_headers(sock)
  File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_handshake.py", line 145, in _get_resp_headers
    status, resp_headers, status_message = read_headers(sock)
  File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_http.py", line 312, in read_headers
    line = recv_line(sock)
  File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_socket.py", line 131, in recv_line
    c = recv(sock, 1)
  File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_socket.py", line 108, in recv
    bytes_ = _recv()
  File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_socket.py", line 87, in _recv
    return sock.recv(bufsize)
ConnectionResetError: [Errno 54] Connection reset by peer

What I don't understand is how this then results in this integration killing Home Assistant. But it does. Or at least, it did on my machine, today.

I have avoided power cycling my Pixelblaze for the moment, in case I need it to reproduce this. But I assume it should be possible to trigger the same condition by having anything answer on port 81 and then drop the connection. This is what I can try as a followup when I have time to do more.

rbrtwtrs commented 1 year ago

I saw the same thing. Blame goes to the integration app more than the PixelBlaze I think. I gave up on it in Home Assistant.

On Wed, Jan 11, 2023 at 8:26 PM Christopher Masto @.***> wrote:

I apologize up front for this not being a great bug report. If it's not immediately helpful, I can spend some time trying to reproduce it when I have a little more time free. Right now I would have to take down my HA installation and it runs the whole house.

What I know: earlier today I discovered my Home Assistant installation was not working. Specifically, connections to the web server would just hang, automations and scenes weren't responding, etc. The machine was still up, Supervisor was running, I could ssh to it and look at logs, etc. On a reboot, it would come back up briefly and then hang again within a minute or two. The log file had messages about various entity updates taking over 10 seconds, connections timing out, etc., so it wasn't obvious where the problem was. By process of elimination, I removed custom_components until I identified pixelblaze as the one which triggers the problem.

During the incident, the only log entries related to this integration I could find look like:

2023-01-11 11:08:32.051 ERROR (MainThread) [custom_components.pixelblaze.light] Failed to update pixelblaze device Basement Counter @.***: Exception: [Errno 110] Operation timed out

It is probably extremely relevant that right now, that PixelBlaze device is in a strange state: it reports itself as alive to discovery, and it serves up its web page, but connections to port 81 for the websocket are immediately reset. This breaks the Pixelblaze client library:

from pixelblaze import * pb = Pixelblaze("10.1.1.76") Traceback (most recent call last): File "", line 1, in File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/pixelblaze/pixelblaze.py", line 243, in init self._open() File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/pixelblaze/pixelblaze.py", line 398, in _open self.ws = websocket.create_connection(uri, sockopt=((socket.SOL_SOCKET, socket.SO_REUSEADDR, 1), (socket.IPPROTO_TCP, socket.TCP_NODELAY, 1),)) File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_core.py", line 608, in create_connection websock.connect(url, *options) File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_core.py", line 253, in connect self.handshake_response = handshake(self.sock, url, addrs, **options) File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_handshake.py", line 57, in handshake status, resp = _get_resp_headers(sock) File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_handshake.py", line 145, in _get_resp_headers status, resp_headers, status_message = read_headers(sock) File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_http.py", line 312, in read_headers line = recv_line(sock) File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_socket.py", line 131, in recv_line c = recv(sock, 1) File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/socket.py", line 108, in recv bytes = _recv() File "/Users/chris/src/pixelblaze-client/.venv/lib/python3.10/site-packages/websocket/_socket.py", line 87, in _recv return sock.recv(bufsize) ConnectionResetError: [Errno 54] Connection reset by peer

What I don't understand is how this then results in this integration killing Home Assistant. But it does. Or at least, it did on my machine, today.

I have avoided power cycling my Pixelblaze for the moment, in case I need it to reproduce this. But I assume it should be possible to trigger the same condition by having anything answer on port 81 and then drop the connection. This is what I can try as a followup when I have time to do more.

— Reply to this email directly, view it on GitHub https://github.com/vutang50/homeassistant-pixelblaze/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGI5J4CLXSW6ZIIQHAGLAR3WR6BXDANCNFSM6AAAAAATYZOO6Y . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Robert Waters (909) 553-2668

masto commented 1 year ago

There's not a lot of code, it seems likely that whatever the problem is should be fixable. It's just catching an exception at https://github.com/vutang50/homeassistant-pixelblaze/blob/bc4b0cbb5235a75cbfb1ff79d558eb5c161274b8/custom_components/pixelblaze/light.py#L95. I don't know anything about the Home Assistant component API, or all that much Python, but I will take a crack at it over the weekend if nobody else does.

dgmltn commented 1 year ago

I had a similar story last night after installing pixelblaze. It's the pixelblaze fault for acting badly, but this integration should still be able to handle bad behavior. Seems like a connection timeout is happening on the HA main thread, where it should be attempting the connection on a background thread. But I'm not a python or HA dev, I'm not sure if that's even possible here.

wizbowes commented 1 year ago

I had a similar story last night after installing pixelblaze. It's the pixelblaze fault for acting badly, but this integration should still be able to handle bad behavior. Seems like a connection timeout is happening on the HA main thread, where it should be attempting the connection on a background thread. But I'm not a python or HA dev, I'm not sure if that's even possible here.

Sorry to say but this project is abandoned. It was created 3 years ago and hasn't been touched since. I wouldn't waste your time with it unless you can fork it and fix it. I'm going to be looking into the MQTT bridge for HA integration because it keeps HA and PixelBlaze decoupled so HA can't be brought down by it.

sevba commented 11 months ago

Experiencing the same issue. Going to try https://github.com/davyhollevoet/pixelblaze_mqtt_bridge instead.