Closed swbova closed 4 years ago
Can you explain what you mean by "this seems to have crashed my HA instance"? Also how much time did it take?
If you increment self._error_count
where you indicated eventually the listener would stop altogether and no longer work for that fan until Home Assistant is rebooted. I want it to keep trying every minute forever if necessary. The user will see the fan as disabled while these errors are happening and then re-enable as soon as the listener reconnects.
These debug messages are only being printed because you have changed the default logging for the Senseme integration. This can result in a lot of logged messages. Are you seeing messages more frequent than once a minute?
Hi Mike
I meant to say that my instance at port 8123 became unreachable. I logged into my HA server, checked and the process was no longer running, although the discovery loop in device.py was. After restarting HA manually, and turning on the wall switch everything was fine. I was able to reproduce this on July 5 by having the switch on, then turning it off. This again resulted in the HA process exiting abnormally.
In any case, since then I have updated to HA 0.112.3 and now I can no longer reproduce the error.
I'm sorry for the false alarm.
Looks like it was a bug in HA. The log has this error:
2020-07-04 21:19:08 ERROR (MainThread) [homeassistant.core] Error doing job: Fatal read error on socket transport
Traceback (most recent call last):
File "/usr/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
data = self._sock.recv(self.max_size)
OSError: [Errno 113] No route to host
There is an old open issue where this message has been seen since 0.57.2 up to 0.111.4
The light is wired to a wall switch. Apparrently when the switch is off, the IP address of the light is unreachable.
This appears to result in an infinite loop in the discover phase. After a sufficient amount of time, this seems to have crashed my HA instance.
I looked at it a little bit. It does appear that perhaps incrementing
self._error_count
around line 690 when theOS_ERROR
exception is caught would fix this? Here is the first error reported:Seven hours later, still going.