Closed mspinolo closed 1 year ago
I've just released
2023.1.0b3
of the LIFX Beta which should do a few things: first, hopefully make bulbs fall offline less often but second, provide more information when they do.
Does this resolve the issues reported in https://github.com/Djelibeybi/ha-lifx-beta/issues/29? I ended up having to roll back the beta install after we last spoke as it was just too unreliable in my scenario.
Does this resolve the issues reported in Djelibeybi/ha-lifx-beta#29?
I'm not sure, however I've pretty much rewritten how the lifespan of each connection (such as it is) to each device is managed so it's certainly worth taking it for a test.
Thanks. I'll give it a shot in the next day or two
Thanks for the new beta. I have installed it now so will give it a whirl. Thanks again.
Edit: OK two hours later. It hasn't fully solved all reasons for the light unavailability anyways. I have had 4 short unavailabilities, 3 on one strip and 1 on another. It might take days to confirm that none stay unavailable for hours. These went unavailable and within 10 seconds were back available.
Out of interest, on your networks, is it worth making them into a sort of worst case, like not disabling power saving on the unifies, seeing can issues be shown there. Similar with wmm, reenabling it and seeing what issues come up and if any tweaks in the integration can get around them.
Logger: custom_components.lifx Source: helpers/update_coordinator.py:168 Integration: LIFX (documentation) First occurred: 08:08:50 (4 occurrences) Last logged: 09:50:38
Error fetching Living Room TV Strip (192.168.8.29) data: Failed to update Living Room TV Strip: No response from Living Room TV Strip (192.168.8.29) [redacted] Error fetching Sink Floor Strip (192.168.8.30) data: Failed to update Sink Floor Strip: No response from Sink Floor Strip (192.168.8.30) [redacted]
I was going to try disabling the integrations discovery mechanism to see if it helps as I won't be adding new devices for some time. Is it the case that I need to uncheck that option for each bulb now as each bulb shows up as its own instance of the integration with its own settings.
There is currently no way to disable discovery of new bulbs. Disabling at the bulb level means any change to the bulb's IP address will not be detected by Home Assistant.
AHH ok no problem
Also, the timeout values on the Beta appear to be a bit low for my ODRIOD N2+ in real-world testing while being just fine on my AMD Ryzen 9 5950 development machine. Go figure. :) I have some updates to tune the timeout coming later.
Thanks a lot. I noticed yesterday while everyone was away from the house, and not googling/browsing/using the network, that the slow trickle of unavailable strips stopped, they seem very sensitive to noise around them. Pity it's UDP / best effort but maybe retries / increased timeouts is all we can do to get around it. Thanks again.
@mark007 Can you post a callgrind.out.XXX file from the profiler.start
service? https://www.home-assistant.io/integrations/profiler/
I'm quite meticulous about making sure nothing is lagging my event loop so that might be the difference about why I don't see any issues.
I'm quite curious of why the device has to turn off when it goes unavailable. If it was just connectivity it would just lose comms but should resume once connectivity is back, however once it's back the device literally turns off then back on which is quite irritating. Not only that but it also tends to lose previous states almost as if the device actually resets itself.
I'm quite curious of why the device has to turn off when it goes unavailable.
If your bulb is flashing and reverting back to a previous state, then it's rebooting, not turning off. That's due to a reason external to the Home Assistant integration.
You can test this by hitting the Reboot button in Home Assistant to see if it does the same thing.
I can trigger it with home assistant, I cannot trigger it trough the LIFX app.
Right, Home Assistant exposes a Reboot button which literally reboots the onboard processor inside the bulb. Most folks should never need to press this button.
They reset whenever I try to for example dim the light. Seems to mostly concern strips.
This should not be happening. Can you please open a new issue for this and include some debug logs that cover the period during which you perform the activity and the light(s) reboot?
My bulb outages seem to be the LifX Mini mostly. I don't have strips and the other ones which are A60 seem to be fine. I am monitoring them.
On/Off also triggers the unavailability. You take the light turn it on/off a few times, light goes unavailable after a few seconds. Do that trough the LIFX app 50 times, no problem.
Here's the profile @bdraco , I ran it for 120s. My HA runs on my old PC. Core i7 6700k. Could it be too fast haha and triggering the timeout earlier than a slower machine might (if it was busy processing something else before it processed the timeout).
Seems to only affect the Z Strips, bulbs work fine.
@x3style have you tried my LIFX Beta component: https://github.com/Djelibeybi/ha-lifx-beta? I'd be very interested to know whether you experience the same effect with that.
Although I have not yet proven in any way that its related, some information on my home network. I currently have a wired lan throughout the house (limited by netgear gigabit switches). I'm using an Orbi RBR750 with two RBS750 satellites. We have a 1gigabit ISP connection (1gig down and 100mbit up) and with the Orbi network, I can see that anything, at least on the 5ghz network, can max out the 1gigabit connection (via fast.com or speedtest apps or steam, probably any other apps if the destination server is fast enough). In theory any app/software update etc on any device could completely saturate our network, even if it was only a burst of a few seconds.
Could be completely unrelated, but just a theory that might mean response times from the strips could be somehow impacted/slowed/delayed if any device like a PC/TV/Phone/Table, maxes out the 1gigabit network, even for a few seconds per hour.
My nest wifi would max out about 650mbits per second (on wifi that is, but it would of course hit 1gig on wired) on any given wireless device, and I didn't see as many unavailable strips with that setup compared to now where even wireless devices do now individually have the ability to saturate the entire 1gig network (both wan and lan having a 1gig max) if they wanted to.
Its still one more reason I would, if I had the ability to test the lifx code in the worst case scenario, would be to enable wmm, enable power saving, maybe if possible limit the network performance somehow artificially to simulate some sort of super worst case scenario, maybe artificially slow down the HA machine (hard to know if this is good as maybe it would mean it processes the timeouts later rather than earlier on a faster HA machine actually). And then see can the strips be kept somehow connected / available.
@x3style have you tried my LIFX Beta component: https://github.com/Djelibeybi/ha-lifx-beta? I'd be very interested to know whether you experience the same effect with that.
Seems a bit more stable, as in I can't make it die via the on/off button, but it still dies within 20-30 seconds of the dim automation starting.
Seems a bit more stable, as in I can't make it die via the on/off button, but it still dies within 20-30 seconds of the dim automation starting.
It would be very much appreciated if you could capture debug logging of this and post it as an issue in the LIFX Beta repo.
FYI heres an image upload of a graph, of a sensor I set up to increment every time any light/strip goes unavailable. It resets at midnight.
You can see at night when the network is idle, and during the day when today for example everyone was out of the house, the graph goes flat showing no unavailable strips at all. https://ibb.co/z6ngM2q
At least its consistent, yesterday about 38 occurrences of unavailable strips, today 43 occurrences. (Note: Two large jumps on the graph was a HA reboot so ignore those).
Seems a bit more stable, as in I can't make it die via the on/off button, but it still dies within 20-30 seconds of the dim automation starting.
It would be very much appreciated if you could capture debug logging of this and post it as an issue in the LIFX Beta repo.
I have not edited the configuration.yaml to enable logging i just enabled debug log on the interface/intergration, it might be wrong but i have not time at the moment. I opened an issue where you suggested and pasted the only line that was relevant from the log.
The issue occurs with both integrations. It seems to be triggered by commands to the lights as far as I can tell, the lights work flawlessly as long as they don't receive commands or something. Or maybe it's that discovery running nonstop doing something, no idea.
Does anyone know why both bulbs and strips randomly do not respond to pings, is this related to the power saving feature. I have set up ping sensors to wired and wireless devices in the house. All are perfectly stable except the lifx bulbs and strips. They seemingly randomly don't respond to pings every few minutes, but I can't see an exact pattern which is annoying.
Intermittent LIFX dropouts has been a problem forever. There is IMO no way to fully fix this in Home Assistant because the bulbs will just stop responding for a short while. So technically they are unavailable but it is too painful to have that reported and then reversed ten seconds later.
We introduced the UNAVAILABLE_GRACE
timeout to ignore intermittent dropouts. It feels like this is no longer being used, perhaps because the data update coordinator (and not aiolifx) now handles unavailability?
@amelchio you're absolutely right. I have a beta version of the LIFX integration where I bypass the data update coordinator by never throwing an exception. So far, my bulbs feel (subjectively) way more stable. I'm hoping to get feedback from other users to see what the broader experience is like before implementing something similar (though perhaps not as extreme) in the core.
Does disabling the ability for the strips to go into power saving mode help with those issues guys, the fact that no matter what some of us do, the strips randomly lose connectivity. If so it'd be great I'd there was some way we could stop them going into that mode (without router level changes).
@mark007 If those are the original LIFX Z strips (i.e. no HomeKit), they are probably the worst at connectivity of all the LIFX products. I actually had one replaced (with a HomeKit version) under warranty so it seems that LIFX agrees that there is nothing that can be done to make them behave.
LIFX agrees that there is nothing that can be done to make them behave.
This is not entirely true, but getting the original Z strip to be stable can be extremely tedious. Essentially it involves the realisation that the wifi antenna in the control box is extremely directional (in all three dimensions, so you have to pivot it around all three of its axis to determine the optimal placement for reception from the closest AP.
If yours are anything like mine, this optimal placement will be the one most out of alignment with any close flat surface, making affixing it in that orientation near impossible. But using this technique I've at least made my original Z pretty rock solid these days.
I seem to lose connection to the below lights. They are mini's and they go offline around 2000 times a day for just a few seconds each. My colour lights don't seem to have the issue at all.
Logger: homeassistant.components.lifx Source: helpers/update_coordinator.py:168 Integration: LIFX (documentation, issues) First occurred: 9:11:44 AM (138 occurrences) Last logged: 9:51:56 AM
Timeout fetching Office Light (192.168.0.169) data Timeout fetching Kitchen Mini (192.168.0.155) data Timeout fetching Guest Hallway Light (192.168.0.196) data
The new beta is showing 0 unavailable lights, which looks nice. I guess it might be for the best if it means the integration then has a chance to perform retries when performing actions towards the bulbs.
I have many hundreds of new log entries from the latest beta. Should we log new bugs or maybe you are aware of them.
One relates to. has_sat = self.bulb.color[HSBK_SATURATION] TypeError: 'NoneType' object is not subscriptable
Another showing
Platform lifx does not generate unique IDs. ID 00:00:00:00:00:00 already exists - ignoring light.lifx_00_00_00_00_00_00
Please log any issues (like that one) here: https://github.com/Djelibeybi/ha-lifx-beta
You may want to use HACS to rollback to the previous version if that continues. I may not get a chance to look at it for a couple of days.
No worries
Quick question. With the latest beta, I have almost 0 unavailable strips as is expected. However one did go unavailable earlier and then came back 10 seconds later but there was this in the logs.
Is this a case where the strip is online and mid way through a piece of communication to it, it goes unavailable.
Is the intention for these to also not trigger the strip to go into unavailable state and I stead go through retries / grace period also?
Error fetching Hob Floor Strip (192.168.x.x) data: Failed to update Hob Floor Strip: Timed out waiting for response.
Is this an edge case that needs a tweak to also behave like the other unavailable cases.
It means that the strip didn't reply to one of our requests, but it has replied before and we hope it'll reply again. So I log it, and if it never responds, it will eventually go offline.
I've kind of run out of ideas with how to improve the stability given the current structure so I'm going to play with returning to a single config entry with multiple devices model to see if I can get that more stable.
As an interim/alternative option, I may publicly release my custom integration that leverages the Photons Interactor addon-on which will do all the heavy lifting of handling the LIFX stuff and the integration just presents that to Home Assistant. I need to bring that up-to-date with all the new stuff the current LIFX core integration does first.
Update on my issue. I fixed it by accident.
So I found 3 lights going offline pretty constantly and tried logs as well as observation to see if there was a pattern. I had about given up when I decided to replace one light from a mini to a full colour light. I got distracted during the process and had left my wireless settings open on my iPhone. When I looked down I saw that the Guest Hallway mini was broadcasting to my iPhone that it was available to be connected to the Wifi network even though it was already connected. It seems it had gone into the wifi available/discovery mode by itself and had been that way for ages. Just popping into that mode randomly and offering itself as a wifi network connection. The other mini's and even my TPLink smart plugs were then trying to connect to it as well for some reason and all went unavailable in HA before reconnecting themselves to the correct network. I left it like that for a while and observed it happening over and over and it all tied in with the unavailability in HA and also the LifX app.
So I deleted the hallway mini and replaced it with the new colour light and lo and behold....not a drop out since. The logs are clean and my Air Purifier is also fixed as well.....it was going offline a lot but I didn't put 2 and 2 together.
So I suggest if you can....Open your wifi settings on your phone and watch it and see if your LifX lights suddenly become available in there without reason. If so you may need to replace it or perhaps do a factory reset and reinstall to get it out of that mode.
That's an interesting observation. So the lights are going into access point mode when they shouldn't? I will try and keep an eye on mine and see if I notice anything similar.
@mark007 I used to have insane issues with my LiFX bulbs (including several z strips that were the worst at staying online) going unavailable and as far as I know it was due to a bug in Unifi's mDNS functionality. Once they fixed that, and I had a dedicated SSID for IoT devices where I disabled every feature that could cause trouble with IoT devices, I have not had any noticeable issues. I do have lots of timeout log entries for my 20 or so LiFX bulbs but never noticed them offline (they are for about 10 seconds according to the log) in HA, and never noticed having any issues controlling them in HA using the LIFX Integration.
Errors I see:
My IoT SSID settings:
Thanks @alexruffell for sharing those details. Yeah its a pity that these bulbs or strips need very specific network tweaks to work reliably. Unfortunately many of us don't have the ability to make certain changes. With my nest WiFi almost every 'advanced' WiFi setting was hidden. On my new Netgear Orbi mesh setup I can at least disable things like WMM on the 2.4ghz band which helped a lot, but it doesn't allow separating the 2.4 and GHz into two separate SSIDs.
I wonder is there some way, maybe via some of the lifx integration devs contacts in the lifx team itself, to see if they can make further improvements to the firmware to make them more reliable. I know they did release a firmware update recently with reliability improvements, so if they are in that frame of mind, maybe its a possibility. Maybe even giving us HA users some (even not exposing via the lifx app) to disable things like the bulb / strip power saving features. It'd be interesting if they did respond with what is and what is not possible from a firmware point of view, and what they would be willing to improve. If its not a physical limitation, I would love if they would make the bulbs perform more reliable in these non tuned WiFi networks which I would guess would be 95% of the population.
@mark007 Ah, I thought you had mentioned being a Unifi user as well. I can't say what the issues are with your networking equipment, if any, but I would not hold my breath regarding improvements on the LIFX side. I've always had the impression they were extremely slow with updates and that they may be limited by the chipset they use... also they recently went bankrupt and were acquired by Feit. I wonder whether the that will help or not (with the current bulb hardware)...
@alexruffell the lights are not actually going offline. They're just not responding within the timeout defined by Home Assistant so it marks the device as unavailable until the nexdt timme it responds.
@Djelibeybi That explains why I never noticed any issues even though I have tons of those errors in my logs.
Investigation update.
After my previous post and the replacement of my Guest Hallway light, everything seemed stable until this morning. The 3 lights were becoming unavailable again and this included the new light.
So I went to the Integrations page and clicked on the new light and it said "2 devices" for some reason. Weird. So it seemed the integration thought there was 2 devices for Guest Hallway.
I check a few more lights and sure enough some of them had 2 devices and some even had 3. Somehow HA was grouping them together and so I deleted all the lights and let HA rediscover them by itself.
It put everything back together and each device is a single device now. The errors have stopped and lights seem to be stable.
Lets see how it goes but its worth checking I guess.
Yep, I've had that experience as well. Essentially what's happening is that a different serial number is getting the IP address previously used by another LIFX device so Home Assistant starts to group them together. This is caused by the config entries using the IP address as the identifier instead of the serial of the device. To fix this would require another config entry format migration, which is currently beyond my scope of expertise (and available time, to be honest).
So I best set them to fixed IP addresses I guess to stop that happening.
The problem
Since the latest integration update I have a lot of occurrences of LIFX light becoming "not available". This is happening on most (but not all) of them (I have 20+ lights).
The behavior is not consistent during the day which makes me suspect there is some relation with the wifi environment (I have 3 AP broadcasting the same SSID on 1-6-11 channel), but by AP logs it doesn't seem to be related to LIFX disconnecting from one AP and reconnecting to the other.
Also I see they usually become unavailable for 10s then coming back online: I ask myself if this has something to do with polling rate cycle of the integration as I see from integration discovery interval is 10 (seconds?)
so could it be that discovery, in my environment, simply can't keep the pace and drops connections?
What version of Home Assistant Core has the issue?
2022.9.5
What was the last working version of Home Assistant Core?
the one before LIFX integration update
What type of installation are you running?
Home Assistant OS
Integration causing the issue
LIFX
Link to integration documentation on our website
https://www.home-assistant.io/integrations/lifx/
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
No response