home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.66k stars 29.95k forks source link

Ambient sensor fails to close old websocket on reconnect #22636

Closed fancygaphtrn closed 5 years ago

fancygaphtrn commented 5 years ago

Home Assistant release with the issue:

0.90.2, 0.90.1

Last working Home Assistant release (if known): Unknown just started using Ambient weather sensor

Operating environment (Hass.io/Docker/Windows/etc.):

arch | x86_64 os_name | Linux Debian GNU/Linux 8 (jessie) virtualenv | true Component/platform:

Ambient Weather Station Sensor https://www.home-assistant.io/components/ambient_station/

Description of problem: Ambient sensor has a number of websocket disconnects. The sensor reconnects and continues to function, but the the websocket is still open according to the operation system. Over time "netstat" will show as many established connections open. I had one occurrence that had thousands.

This is an example from running about 4 hours today.

netstat -p | grep '104.31' | more
tcp        0      0 home:48313              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48278              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48344              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:44242              104.31.83.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:44277              104.31.83.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48305              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48297              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48291              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48315              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48298              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:44293              104.31.83.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48310              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:44297              104.31.83.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:44280              104.31.83.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48323              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48259              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48336              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:44281              104.31.83.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48286              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:44316              104.31.83.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:44288              104.31.83.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:44273              104.31.83.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:48321              104.31.82.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:44283              104.31.83.184:https     ESTABLISHED 7650/python3
tcp        0      0 home:44321              104.31.83.184:https     ESTABLISHED 7650/python3

Restarting Home Assistant will clear these connections back to one

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

Traceback (if applicable):

Additional information: I have a slow Internet connection of 768k up/ 1500 down. This problem seems to be related to periodic congestion. The websocket misses data and the component reconnects without closing the previous socket.

I have a copy of the home assistant log with homeassistant.components.ambient_station: debug on request. This seems to be the relevant part, though:

2019-04-01 09:17:47 INFO (MainThread) [engineio.client] Received packet MESSAGE data 2["data",{"dateutc":1554124620000,"winddir":214,"windspeedmph":7.4,"windgustmph":9.2,"maxdailygust":14.8,"tempf":27,"hourlyrainin":0,"eventrainin":0,"dailyrainin":0,"weeklyrainin":0.1,"monthlyrainin":0,"totalrainin":0.15,"baromrelin":30.3,"baromabsin":27.6,"humidity":54,"tempinf":61.7,"humidityin":45,"uv":2,"solarradiation":231.58,"feelsLike":19.18,"dewPoint":12.6,"lastRain":"2019-03-31T10:14:00.000Z","date":"2019-04-01T13:17:00.000Z","macAddress":"80:7D:3A:7C:1D:5E"}]
2019-04-01 09:17:47 INFO (MainThread) [socketio.client] Received event "data" [/]
2019-04-01 09:17:47 DEBUG (MainThread) [homeassistant.components.ambient_station] New data received: {'dateutc': 1554124620000, 'winddir': 214, 'windspeedmph': 7.4, 'windgustmph': 9.2, 'maxdailygust': 14.8, 'tempf': 27, 'hourlyrainin': 0, 'eventrainin': 0, 'dailyrainin': 0, 'weeklyrainin': 0.1, 'monthlyrainin': 0, 'totalrainin': 0.15, 'baromrelin': 30.3, 'baromabsin': 27.6, 'humidity': 54, 'tempinf': 61.7, 'humidityin': 45, 'uv': 2, 'solarradiation': 231.58, 'feelsLike': 19.18, 'dewPoint': 12.6, 'lastRain': '2019-03-31T10:14:00.000Z', 'date': '2019-04-01T13:17:00.000Z', 'macAddress': '80:7D:3A:7C:1D:5E'}
2019-04-01 09:17:47 DEBUG (MainThread) [homeassistant.components.ambient_station] Resetting watchdog
2019-04-01 09:17:50 DEBUG (MainThread) [homeassistant.components.ambient_station] Watchdog expired; forcing socket reconnection
2019-04-01 09:17:50 INFO (MainThread) [homeassistant.components.ambient_station] Disconnected from websocket
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Sending packet MESSAGE data 1
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Sending packet CLOSE data None
2019-04-01 09:17:50 INFO (MainThread) [socketio.client] Engine.IO connection dropped
2019-04-01 09:17:50 INFO (MainThread) [homeassistant.components.ambient_station] Disconnected from websocket
2019-04-01 09:17:50 DEBUG (MainThread) [homeassistant.components.ambient_station] Watchdog expired; forcing socket reconnection
2019-04-01 09:17:50 INFO (MainThread) [homeassistant.components.ambient_station] Disconnected from websocket
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Attempting WebSocket connection to wss://dash2.ambientweather.net/socket.io/?api=1&applicationKey=xxxxxxx&transport=websocket&EIO=3
2019-04-01 09:17:50 WARNING (MainThread) [engineio.client] Read loop: WebSocket connection was closed, aborting
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Waiting for write loop task to end
2019-04-01 09:17:50 WARNING (MainThread) [engineio.client] Write loop: WebSocket connection was closed, aborting
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Exiting write loop task
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Attempting WebSocket connection to wss://dash2.ambientweather.net/socket.io/?api=1&applicationKey=xxxxxxxxxxx&transport=websocket&EIO=3
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Waiting for ping loop task to end
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Exiting ping task
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Exiting read loop task
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] WebSocket connection accepted with {'sid': '99eq2tPBTSeJk7idAAPU', 'upgrades': [], 'pingInterval': 25000, 'pingTimeout': 60000}
2019-04-01 09:17:50 INFO (MainThread) [socketio.client] Engine.IO connection established
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Sending packet PING data None
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Received packet MESSAGE data 0
2019-04-01 09:17:50 INFO (MainThread) [socketio.client] Namespace / is connected
2019-04-01 09:17:50 INFO (MainThread) [socketio.client] Emitting event "subscribe" [/]
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Sending packet MESSAGE data 2["subscribe",{"apiKeys":["xxxxxxx"]}]
2019-04-01 09:17:50 INFO (MainThread) [homeassistant.components.ambient_station] Connected to websocket
2019-04-01 09:17:50 DEBUG (MainThread) [homeassistant.components.ambient_station] Watchdog starting
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] WebSocket connection accepted with {'sid': '0X_UTqSSDa9WUcmHAAPW', 'upgrades': [], 'pingInterval': 25000, 'pingTimeout': 60000}
2019-04-01 09:17:50 INFO (MainThread) [socketio.client] Engine.IO connection established
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Sending packet PING data None
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Received packet MESSAGE data 0
2019-04-01 09:17:50 INFO (MainThread) [socketio.client] Namespace / is connected
2019-04-01 09:17:50 INFO (MainThread) [socketio.client] Emitting event "subscribe" [/]
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Sending packet MESSAGE data 2["subscribe",{"apiKeys":["xxxxxxxxxxxx"]}]
2019-04-01 09:17:50 INFO (MainThread) [homeassistant.components.ambient_station] Connected to websocket
2019-04-01 09:17:50 DEBUG (MainThread) [homeassistant.components.ambient_station] Watchdog starting
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Received packet PONG data None
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Unexpected error "cannot call recv() while another coroutine is already waiting for the next message", aborting
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Waiting for write loop task to end
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Exiting write loop task
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Received packet PONG data None
2019-04-01 09:17:50 INFO (MainThread) [engineio.client] Received packet MESSAGE data 2["subscribed",{"devices":[{"macAddress":"80:7D:3A:7C:1D:5E","lastData":{"dateutc":1554124620000,"winddir":214,"windspeedmph":7.4,"windgustmph":9.2,"maxdailygust":14.8,"tempf":27,"hourlyrainin":0,"eventrainin":0,"dailyrainin":0,"weeklyrainin":0.1,"monthlyrainin":0,"totalrainin":0.15,"baromrelin":30.3,"baromabsin":27.6,"humidity":54,"tempinf":61.7,"humidityin":45,"uv":2,"solarradiation":231.58,"feelsLike":19.18,"dewPoint":12.6,"lastRain":"2019-03-31T10:14:00.000Z","deviceId":"xxxxxx","date":"2019-04-01T13:17:00.000Z"},"info":{"name":"Fancy Gap","location":"Home"},"apiKey":"xxxxx"}],"method":"subscribe"}]
2019-04-01 09:17:50 INFO (MainThread) [socketio.client] Received event "subscribed" [/]
bachya commented 5 years ago

Thanks for submitting. I'll reach out to the author of the underlying library.

bachya commented 5 years ago

Can you look in your venv and tell me what version of python-engineio you have installed?

fancygaphtrn commented 5 years ago

From \srv\homeassistant\lib\python3.6\site-packages\engineio__init.py version__ = '3.3.2'

bachya commented 5 years ago

Thank you! That gives me what I need. I'll push out a fix today.

fancygaphtrn commented 5 years ago

Looking into this further I believe there is an issue with the watchdog that is causing some of these problems. Currently the watchdog is started in the on_connect and reset in the on_data.

In the normal operation of the websocket if the PONG messages are missed the websocket will automatically close and reconnect. Once reconnected the on_connect will be called.

The problem is the on_connect doesn't clear the previous watchdog, it creates a new one. Some time later the orphaned watchdog will fire and do a disconnect and reconnect, regardless of the current state. This will then happen over and over and over from that point forward.

I suggest adding something like this to the on_connect: if self._watchdog_listener: _LOGGER.debug('On_connect clear Watchdog') self._watchdog_listener() self._watchdog_listener = async_call_later( self._hass, DEFAULT_WATCHDOG_SECONDS, _ws_reconnect)

bachya commented 5 years ago

@fancygaphtrn After on_connect is called, on_data is called and the watchdog is properly cleared and reset there. Every subsequent on_data also clears and resets it.

I'm inclined to believe that bumping the underlying library version will help (https://github.com/miguelgrinberg/python-socketio/issues/285#issuecomment-479385145); let's try that first and if the issue still persists, we can look at this further.

fancygaphtrn commented 5 years ago

I upgraded and fine the problem is not resolved.

The following seems to resolve my issues. In the mean time I moved the component to the custom_components directory and applied this change to the on_connect function of \ambient_station__init__.py

    def on_connect():
        """Define a handler to fire when the websocket is connected."""
        _LOGGER.info('Connected to websocket')
        if self._watchdog_listener:
            _LOGGER.debug('On_connect clear Watchdog')
            self._watchdog_listener()
        _LOGGER.debug('On_connect Watchdog starting')
        self._watchdog_listener = async_call_later(
            self._hass, DEFAULT_WATCHDOG_SECONDS, _ws_reconnect)

The reason is when the underlying libraries do an automatic reconnect, this on_connect function will be called and create another watchdog without removing the previous one. There will then be 2 watchdogs.

I will gladly supply any additional information you would like.

bachya commented 5 years ago

@fancygaphtrn Please read my previous comment. on_data is called after every on_connect and it cancels the old watchdog before creating a new one: https://github.com/home-assistant/home-assistant/blob/02b7fd93ed2684cca2f5cf31229a99dcca317e1d/homeassistant/components/ambient_station/__init__.py#L343-L346

What's more, this same function is called every time data is received, so the socket will be closed and recreated every time.

I'm inclined to think there is something going on with your environment. I've run the 0.91.1 version of the ambient_station integration for over 24 hours and only see one established connection (assuming the IP addresses you provided previously):

$ netstat -p | grep '104.31' | more
tcp        0      0 hub.phil.lan:41848      104.31.83.184:https     ESTABLISHED 729/python

If you still have multiple open connections, can you confirm that you are receiving Ambient data in HASS at all? Perhaps you are never receiving data from the socket at all after connection.

fancygaphtrn commented 5 years ago

I have attached a log file from yesterday. You can see from it that I am receiving data. Just a reminder my Internet speed is limited and this problem happens under congestion. In my case, all works well till I start watching Netflix and congest my Internet.

My observations. All is normal until line 121 when websocket is closed. line 126 indicated the socket was closed when a websocket PONG message is missed. line 132 shows engineio reconnecting the socket. resumes normal operation line 209 shows the watchdog being reset after data received. line 210, which is 1 second later, shows a watchdog expiring message. Note this is 5 minutes after the initial line 121 engineio reconnect. from this point forward engineio will reconnect and the watchdog will expire until home assistant is restarted.

lines 1265 and 1268 shows engineio creating multiple websockets lines 1459 and 1458 shows engineio creating multiple websockets lines 1694 and 1698 shows engineio creating multiple websockets lines 1872 and 1875 shows engineio creating multiple websockets There are many more showing multiples.

home-assistant ambient 2019-04-07.log

bachya commented 5 years ago

Interesting. Let's test your hypothesis:

  1. Make a copy of the homeassistant/components/ambient_station directory under /config/custom_components.
  2. Alter /config/custom_components/ambient_station/__init__.py with your suggested change.
  3. Restart your HASS, let it run, and review the results.
fancygaphtrn commented 5 years ago

Attached log from last nights test.

My observations: All is normal until line 110 when PONG response has not been received Line 118 the websocket is closed Line 121 engineio starts reconnect Line 139 re-connection complete repeats the above a few times during congestion. Line 223 the congestion stopped and started normal operation.

There are more sequences of missing a PONG and the connection restarting but it recovered well

Line 1791 start a period of ambient not sending data, but the websocket stayed up. Line 1816 the watchdog expired after 5 minutes of no data. websocket was disconnected and connected by watchdog. Resumed normal operation

No instances of multiple sockets netstat -p | grep '104.31' | more tcp 0 0 home:59385 104.31.83.184:https ESTABLISHED 32129/python3

I call it a success.

home-assistant ambient 2019-04-09.log