home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.18k stars 29.85k forks source link

Tradfri lights stop working after a couple of hours #14386

Closed winterscar closed 3 years ago

winterscar commented 6 years ago

Home Assistant release with the issue:

0.68.1

Last working Home Assistant release (if known):

..

Operating environment (Hass.io/Docker/Windows/etc.):

Docker on Ubuntu. Using the homeassistant/home-assistant:latest image.

docker compose file:

home-assistant:
    container_name: home-assistant
    restart: unless-stopped
    image: homeassistant/home-assistant
    expose:
       - "8123"
    volumes:
      - ${CONFIG_ROOT}/hass/config:/config
      - /etc/localtime:/etc/localtime:ro
      - ${CONFIG_ROOT}/hass/media:/media

Component/platform:

Ikea Tradfri https://www.home-assistant.io/components/tradfri/

Description of problem:

Tradfri lights stop working after a while.

Steps to reproduce:

Changing the lights from google assistant (connected to home assistant, not tradfri hub) still works.

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

tradfri:
  host: 192.168.1.129
  allow_tradfri_groups: false

Traceback (if applicable):

Additional information:

No log information is produced about the issue.

DanNixon commented 6 years ago

I've been having the same issue.

I started looking into it and found it to be an issue with state observation, the lights are still able to be controlled via the light.* services but the actual state of the light is never updated in HA.

winterscar commented 6 years ago

I've managed to work around it by putting the container in Host network mode, so I suspect it is port related?

DanNixon commented 6 years ago

Odd, I would have expected that to be the solution to it not working at all, rather than a somewhat intermittent issue.

Plus I already have my container in host network mode.

Mariusthvdb commented 6 years ago

another strange thing: using the tradfri groups (created in the app) works when switching on/off it switches on/off all containing lights, but when switching the lights in the group, it doesn't flip the group (in Hassio .68.1)

creating the groups in hassio manually in the config files, containing the same lights, works just fine, both ways.

sveip commented 6 years ago

Mine gets slow to react after a while, but if I then toggle one light (and it takes up to 10 sec to react) it's fast again after that.

DanNixon commented 6 years ago

So my issues also seemed to be network related.

I originally had my gateway on a powerline Ethernet adapter positioned sort of central on the property, I've recently moved it so it is connected directly to the same switch as the server running HA. Due to some legacy mains wiring powerline adapters are pretty hit and miss in this house so I'm assuming the frequent network dropouts were likely the issue previously.

Right now all my lights have been working fine for a couple of days without having to restart either HA or the Tradfri gateway.

winterscar commented 6 years ago

I'm not sure we're experiencing the same issue, as my network configuration is (Hass server) --> ethernet --> (switch) --> ethernet --> (switch) --> tradfri hub. So I've got a pretty much direct connection between the server and the hub.

Is there a good way to monitor what a docker container is doing network wise? Like wireshark or something?

DanNixon commented 6 years ago

I agree, it looks to be something different. It was just the observed effect that was very similar.

I think Wireshark may be the easiest option, I'm not sure Docker has anything built in for network monitoring or not.

ngdio commented 6 years ago

Can confirm I'm also having this issue. Not using Docker. Looking to help if possible

IVI053 commented 6 years ago

I provided my debug log for this here: https://github.com/home-assistant/home-assistant/issues/14577#issuecomment-392935265

max-te commented 6 years ago

I am experiencing the same since 0.55 and have described it in #9822. I have a very hacky workaround for this issue, which has been working reliably for me: https://github.com/home-assistant/home-assistant/issues/9822#issuecomment-357539835

ngdio commented 6 years ago

Have not been experiencing this issue since I switched from Arch without venv -> openSUSE with venv. Might be a dependency issue, that would also be a bug though, so I'll keep this open

morberg commented 6 years ago

Seeing this issue as well on macOS, no docker container. It typically takes 10+ hours after a hass restart before I see it. My workaround is even cruder than @max-te , I restart hass daily with a script...

dirkam commented 6 years ago

This issue still persists. Did someone happen to find a workaround (besides restarting HA frequently)?

sveip commented 6 years ago

I gave up, switched to Deconz, which is better on stability.

Den tor. 30. aug. 2018 kl. 06:18 skrev Zs notifications@github.com:

This issue still persists. Did someone happen to find a workaround (besides restarting HA frequently)?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/home-assistant/home-assistant/issues/14386#issuecomment-417184879, or mute the thread https://github.com/notifications/unsubscribe-auth/AFceLwcZw2xRmpWX-aSFc_MMrkqLJ52vks5uV2eUgaJpZM4T7hni .

dirkam commented 6 years ago

@sveip Can you please elaborate on this? How did you make it work?

sveip commented 6 years ago

I'm not using the IKEA gateway anymore. I bought the https://www.dresden-elektronik.de/conbee/ USB zigbee stick, and installed Deconz (sw). There is an add-on for Deconz, so it should be easy to install. I run it stand-alone. You then pair all the IKEA lights and switches to Deconz instead of the IKEA app.

Den tor. 30. aug. 2018 kl. 10:17 skrev Zs notifications@github.com:

@sveip https://github.com/sveip Can you please elaborate on this? How did you make it work?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/home-assistant/home-assistant/issues/14386#issuecomment-417231893, or mute the thread https://github.com/notifications/unsubscribe-auth/AFceL6_kXDqad3YRnPI_4IyCbjDY3sdRks5uV5-mgaJpZM4T7hni .

dirkam commented 6 years ago

I see, thanks. I hope that this issue can be fixed with the IKEA GW, too. Seems to be a common problem that everyone has.

ngdio commented 6 years ago

How did you install Home Assistant? The issue disappeared for me when I switched from the Arch AUR package to a virtualenv installation.

dirkam commented 6 years ago

Tried hassio and hassbian.

IVI053 commented 6 years ago

@ngdio I'm running a virtualenv setup and still experiencing this problem :-(

ngdio commented 6 years ago

It might also be related to the Linux distribution (and its packages) you're running Home Assistant on. I switched from Arch ARMv7 to openSUSE aarch64 and the problems were gone.

winterscar commented 6 years ago

For reference, I was running Ubuntu Server 18.04.1 LTS and seeing no problems.

IVI053 commented 5 years ago

I'm running Debian Stretch with this problem.

TaroAM commented 5 years ago

@sveip How has the states been with the Deconz way for you? I have the same issue and contemplating doing the same.

I'm not using the IKEA gateway anymore. I bought the https://www.dresden-elektronik.de/conbee/ USB zigbee stick, and installed Deconz (sw). There is an add-on for Deconz, so it should be easy to install. I run it stand-alone. You then pair all the IKEA lights and switches to Deconz instead of the IKEA app.

dirkam commented 5 years ago

I ended up using the workaround from @max-te described in #9822. Works fine, though it requires a reboot every several days, so I added an automation rule, which reboots the host if memory usage is above 80 percent.

sveip commented 5 years ago

Deconz works well for lights and switches. The dimmer I've not had great success with.

Peter

Den fre. 21. sep. 2018 kl. 10:45 skrev TaroAM notifications@github.com:

@sveip https://github.com/sveip https://github.com/sveip How has the states been with the Deconz way for you? I have the same issue and contemplating doing the same.

I'm not using the IKEA gateway anymore. I bought the https://www.dresden-elektronik.de/conbee/ USB zigbee stick, and installed Deconz (sw). There is an add-on for Deconz, so it should be easy to install. I run it stand-alone. You then pair all the IKEA lights and switches to Deconz instead of the IKEA app.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/home-assistant/home-assistant/issues/14386#issuecomment-423460741, or mute the thread https://github.com/notifications/unsubscribe-auth/AFceL2qadj2Afd5oIuUkHMnDOCE_8aH5ks5udKcugaJpZM4T7hni .

alexhardwicke commented 5 years ago

I'm running Ubuntu Server 18.04.1 LTS as my host and then using the official docker container to run Home Assistant.

I get the same problem, typically after 30-60 minutes. Tried @max-te's fix and it worked although I have had a few stability issues in terms of RAM/CPU usage since I added it, so I'm still testing.

comatose-tortoise commented 5 years ago

Also having this problem, docker on ubuntu, 0.82.1. Have to restart in order to get tradfri lights responding/updating again.

magma1447 commented 5 years ago

Same issue here. The feedback from the Trådfri hub stops working. Setting lights via scripts from Home Assistant keeps working, the issue is just that the UI that prevents it since it's stuck in the wrong state (can't turn on a light it believe is on).

I have tried to set duration=3600 (and 600) in light/tradfri.py and switch/tradfri.py. It did not help me. I didn't manage to get the Tradfri module to start with the full patch from @alex3305. But if I didn't miss anything, it was just three values changed from 0 to 3600, besides using a define for it.

Example of my updated light/tradfri.py

--- /srv/homeassistant/lib/python3.5/site-packages/homeassistant/components/light/tradfri.py    2018-11-17 10:51:27.324283838 +0100
+++ custom_components/light/tradfri.py  2018-11-24 19:57:53.246888134 +0100
@@ -126,7 +126,7 @@
         try:
             cmd = self._group.observe(callback=self._observe_update,
                                       err_callback=self._async_start_observe,
-                                      duration=0)
+                                      duration=600)
             self.hass.async_create_task(self._api(cmd))
         except PytradfriError as err:
             _LOGGER.warning("Observation failed, trying again", exc_info=err)
@@ -345,7 +345,7 @@
         try:
             cmd = self._light.observe(callback=self._observe_update,
                                       err_callback=self._async_start_observe,
-                                      duration=0)
+                                      duration=600)
             self.hass.async_create_task(self._api(cmd))
         except PytradfriError as err:
             _LOGGER.warning("Observation failed, trying again", exc_info=err)
alex3305 commented 5 years ago

@magma1447 According to @max-te you will also have to manually set a repeat trigger. I am still unsure if that is the case. But it seems according to your experience that is an issue.

I am currently wondering if it is an issue regarding the recently added switches to either Home Assistant or pytradfri.

Edit: I also created a dirty workaround:

- id: restart_home_assistant
  alias: Home Assistant restart
  trigger:
    platform: state
    entity_id: script.sleeping
    to: 'off'
  condition:
    condition: or
    conditions:
      - condition: state
        entity_id: switch.tradfri_1
        state: 'on'
      - condition: state
        entity_id: light.tradfri_2
        state: 'on'
  action:
    service: homeassistant.restart

When sleeping is set to off, it checks whether some of the lights/switches are still on. If that is the case, than Tradfri observations are most likely stuck and will restart Home Assistant. An action that I will otherwise do manually.

ngdio commented 5 years ago

Could you please provide all of your system details (operating system, docker/venv/os package)? This issue most likely occurs only in certain environments but right now it's not really clear when exactly this is the case.

magma1447 commented 5 years ago

@ngdio Debian Stretch x64 (KVM instance), Python virtual environment. I am happy to provide more information in case someone knows what to ask for. While I have been working with Linux for almost 20 years, I am not too familiar with Python and its virtual-env.

Home Assistant 0.82.1, if I recall correctly I was running 0.78 before this, with the same issue.

Adding some more information after reading @alex3305 post (after this one) My Trådfri hub is quite new (summer 2018). I have 12 spotlights, 2 normal bulbs, 4 panels, no switches. I always have the UI open on my workstation. I tend to turn on/off 12 spotlights at the same time. I don't know if it breaks when doing that, but it seems like @alex3305 suspects that.

@alex3305 Thanks, I will try to implement it the way @max-te did. I will get back with the result when I know if it worked. If not for anything else, as information for those that has the issue as well.

alex3305 commented 5 years ago

@ngdio HASS.io manual installation here on 0.82.1 (edit: on RasPi). I have bought the Tradfri hub in the last month and I have both lights and sockets connected to the hub.

Clues I could find regarding this issue so far are:

Things that I could think of that can go wrong:

Anyway, quite hard to figure this one out. But glad to help!

morberg commented 5 years ago

Could this be related to the amount of updates sent to the lights? I see this problem when using the Flux component (which updates the lights quite often), but when I turn off Flux I can’t reproduce the issue.

magma1447 commented 5 years ago

@morberg I don't have any technical insight. But besides using 12 of my spotlights at the same time, from time to time, we seldom turn on or off our lights at all.

Right now I am testing the patch by @max-te. But after that I could very well pull the power from my 12 spots to see if that helps.

max-te commented 5 years ago

@ngdio The issue has persisted for me across several (x86) machines running Docker, most recently under Arch Linux. It also persisted through the replacement of multiple pieces of network equipment to the point where I'm confident to say that the only constants were my Tradfri hub and the fact that I'm using Docker on an x86 system.

alex3305 commented 5 years ago

@magma1447 It's only in my observation that it (mostly) breaks when operating mulitple lights at once. Of course I could be wrong.

For troubleshooting you can always try putting some additional debug logging or Python print() functies inside the _observe_update and/or _async_start_observe functions. But I suspect those will not be updated anymore when this issue occurs.

@morberg Sure. Because of how the current observation model works. Currently for every update a controller (ie. Home Assistant) sends, the observe callback will be triggered. But after some testing and debugging, I found that a single operation could easily trigger three observation events. Those events are then also passed through to the operator.

That's why I've submitted PR ggravlingen/pytradfri#208. This will at least check whether the current state changes. If the state has changed, the operation will be passed back to the operator. But when the state hasn't changed, the event will be silently dropped. Which would eliminate multiple, equal updates.

alex3305 commented 5 years ago

Small update for all the watchers. I've issued a new pull request. Which adds async locks on the places were multiple threads could possibly manipulate the state of the Tradfri objects. I've tested this change for more than an hour with mulitple browser sesseions, while operating almost all the lights in my home at the same time.

Since the change I could not reproduce my issues anymore. So I would like to ask if some of you may want to test these changes? This would be quite easy as you can just add the Python files in the custom_components directory.

Mariusthvdb commented 5 years ago

just as a small but wary side note: I'm not experiencing any of these issues at all, and Tradfri has been rock solid in my setup like forever, both in the old setup, and now with the integration. Using many lights, of all available types, using the outlet switches and 3 types of remotes, and the motion sensors.

Not sure what you're fixing here, but sure hope you're not fixing anything that isn't broken.... If I could assist by checking anything in my setup please let me know, be glad to.

alex3305 commented 5 years ago

@Mariusthvdb It would be great if you can run the above modifications on your own installation, just for testing.

I am currently running my modifications for about 12 hours now. It is working great so far and I did not have this issue at all anymore. Even when extensively operating lights and switches. Last night when I went to bed and operated all my switches and lights at the same time from an automation, one light got stuck. But that resolved itself after about half an hour or so. Also my unavailable lights now show up correctly (unavailable) instead of always being 'turned on'. So it seems that even another issue can be resolved with my change.

cc. @magma1447 @winterscar could you also test this change?

Mariusthvdb commented 5 years ago

well, tbh, before testing any modifications, Id need to have symptoms of some sorts... My Tradfri Hub is rock solid, and never hangs. For almost over a year now. With many bulbs, sensors, and switches lately. So testing this wouldn't improve on anything would it? I would not have any way to notice I am afraid. Thats why I asked not to change anything to a code that is actually working just fine. It might well be something else in your configuration?

About the unavailable lights: Would indeed be nice, if Tradfri would indicate the lights to be unavailable automatically. I now have a very simple automation that takes care of that for me, but a native action would be preferable.

magma1447 commented 5 years ago

@alex3305 I will definitely test it. Currently running the other patch and it has worked fine over night. I will replace that patch with yours later today. I just want to run the other one a bit more to be (more) sure of my result.

From what you have written with the duplicate packages coming, it sounds reasonable to add locks. I have high hopes that you managed to figure it out.

Maybe it's less/more likely to happen depending on how high the latency for the Tradfri network is. But if so, I believe it should happen to everyone, sooner or later.

krito commented 5 years ago

I have the same issue. Lights not turning on when they have been turned off (after some hours. Reboot helps) unless change of brightness.

Running latest home assistant on docker and trådfri gw. Raspberry pi 3. With rasbian stretch lite.

alexhardwicke commented 5 years ago

I've been having the same problems (typically after 30-60m), plus as I previously mentioned, when using @max-te's fix, I get massive CPU spikes fairly frequently and every few hours home-assistant consumes 100% CPU until eventually the process ends (and I've tested not having the fix and the spikes disappear again). I'm honestly not sure I'd not rather just auto--restart every 30m than have the CPU fan come on at full speed and have HASS freeze up for about 5-10 minutes before restarting.

I've been running @alex3305's fix for about 16 hours now and everything is working flawlessly, and the CPU spiking is completely gone. It's fairly obvious from the attached picture when I moved from @max-te's to @alex3305's fix. Seems pretty flawless for now.

image

alex3305 commented 5 years ago

@Mariusthvdb You can also test without any symptons if you would like to. That way we can verify if the PR doesn't have any side effects... But if you don't want to, I completely understand.

@alexhardwicke Great to hear!

magma1447 commented 5 years ago

@alexhardwicke The fix from @max-te was quite aggressive with its 2 minute timer. While I didn't notice any cpu spikes while have it running, shutting Home assistant off after almost 24 hours took quite some time. It seemed to clean up a lot of lost objects (or similar). It even took 2-3 minutes on my virtual machine on a dual xeon server.

I have been running @alex3305 latest patch for an hour now. This far no issues. And the concept of his patch really makes sense to me, if the assumptions around it are true (I am definitely no expert). The patch from @max-te was more of a fast hackish workaround I would say.

alexhardwicke commented 5 years ago

@magma1447 Yeah. I've been running in a VM too, on my gaming PC rather than a traditional server, so I've had a fairly beefy CPU available (although I only assigned it one physical core).

I did try with 1 hour instead of 2 minutes and still had frequent spiking. Very strange. I suppose it doesn't matter why now that there seems to be a more "correct" fix.

max-te commented 5 years ago

In my instance @alex3305 patch does not prevent the problem, I just had it happen again.

alex3305 commented 5 years ago

@max-te Are you sure you are using the correct version? Yesterday I saw the same behaviour, but it sorted itself out after about half an hour. So if you can just wait it out, that would be great.