home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.7k stars 30.82k forks source link

Loosing websocket connection to deconz after a few hours #29180

Closed quazzie closed 4 years ago

quazzie commented 4 years ago

Home Assistant release with the issue:

arch x86_64
dev false
docker true
hassio true
os_name Linux
python_version 3.7.4
timezone Europe/Stockholm
version 0.102.2
virtualenv false

Last working Home Assistant release (if known): 9x

Operating environment (Hass.io/Docker/Windows/etc.):

Hassio docker on Linux nas 4.15.0-51-generic #55-Ubuntu SMP Wed May 15 14:27:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Integration:

deconz

Description of problem: After some hours i stop receiving events from deconz, same problem as discussed in https://github.com/home-assistant/home-assistant/issues/27531

After a restart of HA i get regular events from deconz over websocket, ex:

2019-11-28 08:48:03 DEBUG (MainThread) [pydeconz.websocket] Websocket data: {"e":"changed","id":"1","r":"sensors","state":{"dark":false,"daylight":false,"lastupdated":"2019-11-28T07:48:03","status":150,"sunrise":"2019-11-28T07:40:47","sunset":"2019-11-28T13:39:22"},"t":"event","uniqueid":"00:21:2e:ff:ff:02:45:ed-01"}

And when turning on/off a light:

2019-11-28 08:46:09 DEBUG (MainThread) [pydeconz.utils] Sending {'data': '{"on": true, "transitiontime": 0}'} to http://172.30.32.1:40850/api/F616747C41/lights/5/state
2019-11-28 08:46:09 DEBUG (MainThread) [pydeconz.utils] HTTP request response: [{'success': {'/lights/5/state/on': True}}]
2019-11-28 08:46:09 DEBUG (MainThread) [pydeconz.websocket] Websocket data: {"e":"changed","id":"5","r":"lights","state":{"alert":null,"bri":254,"on":true,"reachable":true},"t":"event","uniqueid":"00:0b:57:ff:fe:30:63:ae-01"}
2019-11-28 08:46:09 DEBUG (MainThread) [pydeconz] Unsupported event {'e': 'changed', 'id': '65520', 'r': 'groups', 'state': {'all_on': False, 'any_on': True}, 't': 'event'}
2019-11-28 08:46:09 DEBUG (MainThread) [pydeconz.websocket] Websocket data: {"e":"changed","id":"65520","r":"groups","state":{"all_on":false,"any_on":true},"t":"event"}

But after a couple of hours it stops, but no errors in log (either HA or deconz).

Then when it's broken i can still turn on/off the light in HA and the state in deconz changes but the state in HA reverts. So the rest api call works but HA expects a websocket event to change the state but because the websocket connection is dead it never receives this event.

DEBUG (MainThread) [pydeconz.utils] Sending {'data': '{"on": true}'} to http://172.30.32.1:40850/api/F616747C41/lights/6/state
DEBUG (MainThread) [pydeconz.utils] HTTP request response: [{'success': {'/lights/6/state/on': True}}]

Maybe irrelevant but at startup of HA i get:

2019-11-28 08:36:47 DEBUG (MainThread) [pydeconz.utils] Sending {} to http://172.30.32.1:40850/api/F616747C41
2019-11-28 08:37:01 ERROR (MainThread) [homeassistant.components.deconz] Error connecting to deCONZ gateway at 172.30.32.1
2019-11-28 08:37:01 WARNING (MainThread) [homeassistant.config_entries] Config entry for deconz not ready yet. Retrying in 5 seconds.

Traceback (if applicable):

Additional information:

logger:
  default: warning
  logs:
    pydeconz: debug
    homeassistant.components.deconz: debug
probot-home-assistant[bot] commented 4 years ago

Hey there @kane610, mind taking a look at this issue as its been labeled with a integration (deconz) you are listed as a codeowner for? Thanks!

quazzie commented 4 years ago

Got a complete HA-log from today when it happened again. This time it was i think less than an hour before it borked. https://pastebin.com/9BFsCQAp and here is the deconz log (no extra debug flags, only standard) from just before HA connects: https://pastebin.com/61DyycDF

pergolafabio commented 4 years ago

hi , yes, i am thread owner of that issue here : https://github.com/home-assistant/home-assistant/issues/27531 seems you have exact same issue as me my system was stable for 3-4 days, but a few hours ago, websocket was dead again in phoscon, everything works, but the states are just not coming back to HA... a HA restart is needed to make it functional again

when i enable debug, i dont see any error at all, its just looks likes its dead, also no websocket close message or something like that

pergolafabio commented 4 years ago

when it happened 2 days ago, i saw this in my .log file, not sure if its related, but i dont see that message normally

2019-11-29 15:01:55 WARNING (MainThread) [hass_nabucasa.iot] Connection closed: Received non-Text message: 8
2019-11-29 15:02:10 WARNING (MainThread) [hass_nabucasa.iot] Unable to connect: 504, message='Invalid response status', url='wss://cloud.nabucasa.com/websocket
quazzie commented 4 years ago

For me it is not stable for a whole day even. I have to restart HA 1-2 times a day... And that is only because i notice the problem of wrong state in HA or that my automation does not work because the wrong state in HA.

Infuriating. Is there a way to migrate from deconz to ZHA or do i have to re-pair everything ?

pergolafabio commented 4 years ago

I had the same, almost every 12 hours... What custom components have you installed? Try to disable some to rule out...

In my case, not 100% sure, but by disabling the Audi component, it's stable again

Kane610 commented 4 years ago

There are only a few people who has got this issue leading me to believe this is due to some other component in the system. What other integrations are you running and are you running any custom components?

quazzie commented 4 years ago

No custom components

hassio addons: Almond, ESPHome, Grafana, Node-RED, Portainer, Sparsnäs Decoder, Visual Stdio Code, deConz

Integrations UI: Almond: Almond Google Cast: Google Cast Swedish weather service (SMHI): Home Mobile App: POCOPHONE F1 Transmission: Transmission Lots of ESPHome MQTT: localhost, lots of sensors/devices UniFi Controller: quazzino Plex: quazzino

configuration.yaml

homeassistant: (...)

google_assistant: (....)

device_tracker:
  - platform: google_maps ...

ffmpeg:

stream:

person:

google:
  client_id: !secret google_id
  client_secret: !secret google_secret

speedtestdotnet:
  scan_interval:
    hours: 3

frontend:
  themes: !include_dir_merge_named themes

config:

mobile_app:

http:
  base_url: xxx

hassio:

conversation:

history:
  exclude:
    domains:
      - weblink
      - updater
      - timer
      - group

recorder:
  db_url: !secret db_url
  purge_keep_days: 10
  purge_interval: 1
  exclude:
    domains:
      - weblink
      - updater
      - timer
      - group
      - timer
    entities:
      - sun.sun

logger:
  default: warning
  logs:
    urllib3: fatal
    urllib3.connectionpool: fatal
    homeassistant.components.camera: fatal
    requests.packages.urllib3.connectionpool: fatal
    pydeconz: debug
    homeassistant.components.deconz: debug 
    pychromecast.socket_client: fatal

system_health:

graphite:
  host: localhost

sun:

tts:
  - platform: google_translate
    service_name: google_say

notify:
   ### a couple of notifygroups and html5

zone: !include zones.yaml (4 zones)
weblink: !include weblinks.yaml (nothing)
group: !include groups.yaml (2 standard groups)
automation: !include automations.yaml (6 automations)
Kane610 commented 4 years ago

It could still be some other integrationt fault I recommend you trying running only deconz with nothing else enabled to single out the cause

Kane610 commented 4 years ago

I suggest we close this, most likely this issue is due to another integration behaving poorly

pergolafabio commented 4 years ago

No, don't close it, I have totally different integrations and I also have the issued from time to time, I want to know the root cause

Kane610 commented 4 years ago

@pergolafabio didn't you find the root cause in the Audi integration?

pergolafabio commented 4 years ago

i am not 100% sure, i have disabled that component, but still have it, but less frequently instead of every 12 hours i have it like once a week...

and this thread owner , his config/intregrations is totally different

pergolafabio commented 4 years ago

Updated to 103.0, arfghh! Problem back while I had it stable for a week... Now again after x hours, no states coming in from deconz... This really is annoying

pergolafabio commented 4 years ago

it was 9 hours ago when it happened gonna start with disabling some discovery components like zeroconf: ssdp:

i dont think another intregration is causing this, i thought so first, that audi custom was the culprit, but its already disabled

also my config looks totally different that the theadowner i normally see the issue after about 12 h, then i need to restart HA , so gonna wait a bit

anyway, when i had the issue, i had enabled debug below, but it didnt give any usefull info, cause when the issue is present, there is no log at all, its just dead what else can i enable to debug websockets or something? cause i only enable deconz logging , maybe i need to enable websocket logging in HA instead?

# pydeconz: debug
# homeassistant.components.deconz: debug  

also another note, the issue seems to comeback to me when i do an update, the 102.x releases were stable for me, i had issues in .95. .98 ... , now back in .103

maybe the update precdure is causing an issue? maybe some corrupt file or something? i am working on an ESXI server, i created a snapshot thank god, for 102.3 , so i can easy rollback if needed, allthough i really want to find the culprit, its so frustating that i need to restart HA every 12 hours

pergolafabio commented 4 years ago

ok, happened again, so disabled now :

zeroconf: & ssdp:

lets see what happens in next 12 hours can anyone help me what i need to turn on for debug this doesnt help, there is no extra information when the issue occured

 pydeconz: debug
 homeassistant.components.deconz: debug  
pergolafabio commented 4 years ago

ok, 24 hours further, still stable now with zeroconf removed offcourse not sure if thats the culprit

pergolafabio commented 4 years ago

seems stable again, i have no idea why, but removing zeroconf made is stable offcourse no idea if that was the cause, i have it random, especially after a new update

really strange

quazzie commented 4 years ago

I have tried disabling and enabling lots different integration back and forth with no good result. But after upgrading to 0.103.5 and deconz addon to 5.0 it SEEMS stable. Have not lost connection for 3 days now... But as the time to disconnect is random i'm not sure its gone.

Kane610 commented 4 years ago

Hey guys! I found something in the web socket library worth trying out, but I need someone of you to add a few characters in the pydeconz code.

Anyone up for a trial?

quazzie commented 4 years ago

What's involved? And as of my last post I have not had the problem anymore. But if it helps anyway I could try.

Kane610 commented 4 years ago

You need to go into the the hass container and change a file

diff --git a/pydeconz/websocket.py b/pydeconz/websocket.py index c9190ab..b4d3bef 100644 --- a/pydeconz/websocket.py +++ b/pydeconz/websocket.py @@ -59,11 +59,12 @@ class AIOWSClient: url = f"http://{self.host}:{self.port}"

     try:
pergolafabio commented 4 years ago

Mine is also stable again, so when issue occurs again, I try it out...

Bookmarked!

Kane610 commented 4 years ago

Allright, Im closing this in the meantime. We can open it up again if it occurs in the future.

pergolafabio commented 4 years ago

perfect, thnx in advance