Closed fancygaphtrn closed 5 years ago
Thanks for submitting. I'll reach out to the author of the underlying library.
Can you look in your venv and tell me what version of python-engineio
you have installed?
From \srv\homeassistant\lib\python3.6\site-packages\engineio__init.py version__ = '3.3.2'
Thank you! That gives me what I need. I'll push out a fix today.
Looking into this further I believe there is an issue with the watchdog that is causing some of these problems. Currently the watchdog is started in the on_connect and reset in the on_data.
In the normal operation of the websocket if the PONG messages are missed the websocket will automatically close and reconnect. Once reconnected the on_connect will be called.
The problem is the on_connect doesn't clear the previous watchdog, it creates a new one. Some time later the orphaned watchdog will fire and do a disconnect and reconnect, regardless of the current state. This will then happen over and over and over from that point forward.
I suggest adding something like this to the on_connect: if self._watchdog_listener: _LOGGER.debug('On_connect clear Watchdog') self._watchdog_listener() self._watchdog_listener = async_call_later( self._hass, DEFAULT_WATCHDOG_SECONDS, _ws_reconnect)
@fancygaphtrn After on_connect
is called, on_data
is called and the watchdog is properly cleared and reset there. Every subsequent on_data
also clears and resets it.
I'm inclined to believe that bumping the underlying library version will help (https://github.com/miguelgrinberg/python-socketio/issues/285#issuecomment-479385145); let's try that first and if the issue still persists, we can look at this further.
I upgraded and fine the problem is not resolved.
The following seems to resolve my issues.
In the mean time I moved the component to the custom_components directory and applied this change to the on_connect function of
def on_connect():
"""Define a handler to fire when the websocket is connected."""
_LOGGER.info('Connected to websocket')
if self._watchdog_listener:
_LOGGER.debug('On_connect clear Watchdog')
self._watchdog_listener()
_LOGGER.debug('On_connect Watchdog starting')
self._watchdog_listener = async_call_later(
self._hass, DEFAULT_WATCHDOG_SECONDS, _ws_reconnect)
The reason is when the underlying libraries do an automatic reconnect, this on_connect function will be called and create another watchdog without removing the previous one. There will then be 2 watchdogs.
I will gladly supply any additional information you would like.
@fancygaphtrn Please read my previous comment. on_data
is called after every on_connect
and it cancels the old watchdog before creating a new one: https://github.com/home-assistant/home-assistant/blob/02b7fd93ed2684cca2f5cf31229a99dcca317e1d/homeassistant/components/ambient_station/__init__.py#L343-L346
What's more, this same function is called every time data is received, so the socket will be closed and recreated every time.
I'm inclined to think there is something going on with your environment. I've run the 0.91.1 version of the ambient_station
integration for over 24 hours and only see one established connection (assuming the IP addresses you provided previously):
$ netstat -p | grep '104.31' | more
tcp 0 0 hub.phil.lan:41848 104.31.83.184:https ESTABLISHED 729/python
If you still have multiple open connections, can you confirm that you are receiving Ambient data in HASS at all? Perhaps you are never receiving data from the socket at all after connection.
I have attached a log file from yesterday. You can see from it that I am receiving data. Just a reminder my Internet speed is limited and this problem happens under congestion. In my case, all works well till I start watching Netflix and congest my Internet.
My observations. All is normal until line 121 when websocket is closed. line 126 indicated the socket was closed when a websocket PONG message is missed. line 132 shows engineio reconnecting the socket. resumes normal operation line 209 shows the watchdog being reset after data received. line 210, which is 1 second later, shows a watchdog expiring message. Note this is 5 minutes after the initial line 121 engineio reconnect. from this point forward engineio will reconnect and the watchdog will expire until home assistant is restarted.
lines 1265 and 1268 shows engineio creating multiple websockets lines 1459 and 1458 shows engineio creating multiple websockets lines 1694 and 1698 shows engineio creating multiple websockets lines 1872 and 1875 shows engineio creating multiple websockets There are many more showing multiples.
Interesting. Let's test your hypothesis:
homeassistant/components/ambient_station
directory under /config/custom_components
./config/custom_components/ambient_station/__init__.py
with your suggested change.Attached log from last nights test.
My observations: All is normal until line 110 when PONG response has not been received Line 118 the websocket is closed Line 121 engineio starts reconnect Line 139 re-connection complete repeats the above a few times during congestion. Line 223 the congestion stopped and started normal operation.
There are more sequences of missing a PONG and the connection restarting but it recovered well
Line 1791 start a period of ambient not sending data, but the websocket stayed up. Line 1816 the watchdog expired after 5 minutes of no data. websocket was disconnected and connected by watchdog. Resumed normal operation
No instances of multiple sockets netstat -p | grep '104.31' | more tcp 0 0 home:59385 104.31.83.184:https ESTABLISHED 32129/python3
I call it a success.
Home Assistant release with the issue:
0.90.2, 0.90.1
Last working Home Assistant release (if known): Unknown just started using Ambient weather sensor
Operating environment (Hass.io/Docker/Windows/etc.):
arch | x86_64 os_name | Linux Debian GNU/Linux 8 (jessie) virtualenv | true Component/platform:
Ambient Weather Station Sensor https://www.home-assistant.io/components/ambient_station/
Description of problem: Ambient sensor has a number of websocket disconnects. The sensor reconnects and continues to function, but the the websocket is still open according to the operation system. Over time "netstat" will show as many established connections open. I had one occurrence that had thousands.
This is an example from running about 4 hours today.
Restarting Home Assistant will clear these connections back to one
Problem-relevant
configuration.yaml
entries and (fill out even if it seems unimportant):Traceback (if applicable):
Additional information: I have a slow Internet connection of 768k up/ 1500 down. This problem seems to be related to periodic congestion. The websocket misses data and the component reconnects without closing the previous socket.
I have a copy of the home assistant log with homeassistant.components.ambient_station: debug on request. This seems to be the relevant part, though: