home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.61k stars 30.77k forks source link

0.108.0 WLED causing strange LED strip behaviour #33921

Closed raoulteeuwen closed 4 years ago

raoulteeuwen commented 4 years ago

The problem

Before 0.108.0 i was very happily using the WLED integration on 2 LED strips. With 0.108.0 both strips (both with their own, and different, 8266 (NodeMCU/Wemos, both running the latest stable release of WLED) now when switched on flash and are unreachable, like something is flooding the 8266? When i delete the integration from HA and restart HA, turn on power on the LED strips, the normal WLED web UI is useable normally. So pretty sure it is HA that is causing this somehow...

Environment

Problem-relevant configuration.yaml

Traceback/Error logs

Additional information

probot-home-assistant[bot] commented 4 years ago

Hey there @frenck, mind taking a look at this issue as its been labeled with a integration (wled) you are listed as a codeowner for? Thanks!

raoulteeuwen commented 4 years ago

I have updated to HA 0.108.1 and will test with that version ... will now more in 1 or 2 days i hope.

raoulteeuwen commented 4 years ago

Tried with 0.108.1 and 0.108.2 as well. Can reproduce. With WLED integration added, strips flashes (every x seconds, 4 < x < 10), with WLED integration removed and HA restarted, no flashes. Was o.k. till 0.107.7.

otiswrx commented 4 years ago

I can confirm this error as well. Updated from 107.7 to 108.3 and WLED (4 seperate deployments) all became unresponsive. Deleting the integration and restarting home assistant allows me to access the web ui to control the WLED device. There is an error in the log before deleting the integration - "Error fetching wled data: Invalid response from API: Error occurred while communicating with WLED device."

THATDONFC commented 4 years ago

I’m experiencing similar behavior. When I upgraded to 0.108.0, my WLED device experienced a severe boot loop. It would reset every 10 seconds or so. The device was unusable. I had to delete the integration and forget WLED from discovery for the issue to go away. I captured some packets on the WLAN and saw tons of failed requests from my HA IP to WLED IP. The integration is not usable for me right now.

Mattjet27 commented 4 years ago

I can also confirm that I am seeing this issue after upgrading to 108.0. I have two 8266 boards setup as WLED devices through HA (running in docker on RPi3), both with the latest 0.9.1 WLED build.

guedeaux commented 4 years ago

I am also having issues with this error:

Logger: homeassistant.components.wled Source: helpers/update_coordinator.py:143 Integration: wled (documentation, issues) First occurred: 9:45:32 AM (2 occurrences) Last logged: 10:18:37 AM Error fetching wled data: Invalid response from API: Error occurred while communicating with WLED device.

HassOS, 0.108.3, WLED 0.9.1, ESP8266 (NodeMCU)

r100gs commented 4 years ago

I have 2 Lights: 1 with 5 and 1 with 7 LEDs. Both stopped working with HA 108.x. I discovered that wleds which are connected to WiFi during start of HA Work fine. If I turn them off than the problem starts

SawKyrom commented 4 years ago

I reported the issue on Aircookie's github repository yesterday.

https://github.com/Aircoookie/WLED/issues/843

Same error.

Screenshot (179)_LI

Kosh42 commented 4 years ago

Same here. Issue since 108. Had to delete the WLED integration from HA to make them usable again.

luci84tm commented 4 years ago

I have the same boot loop issue of wled after update to ha 0.108. Can someone have a look what exactly got broken with this version ?

luci84tm commented 4 years ago

Any news here ?

triggerx commented 4 years ago

I'm seeing this behavior on WLED 0.9.1, and I'm not even running HA.

SawKyrom commented 4 years ago

I don't know the source of the conflict, but I can tell you what I did to make it work again. I required a new nodeMCU and fresh installation of WLED 0.9.1.

I install WLED on all my nodes with Arduino .ino file and despite attempting to re-flash multiple times with various Arduino core versions, there was never a good wifi connection on the original after I setup the MQTT feature. I'm guessing that there is some data written deep in the memory that is retained and causing conflicts. Supporting this statement is the fact that after removing WLED and writing a simple ESPwifi connection program to the Node, it would fail to connect or stayed connected to my router every time. This happened with the original node and a new one, so I doubt it's a hardware issue. I also posted illustrations here: https://github.com/Aircoookie/WLED/issues/843

I plan on trying to fix these by downloading esptool and writing a blank bin to the original devices, but it was just easier for me to load a new nodeMCU. If you DON'T have an extra ESP8266 laying around, I would suggest writing a attempting the blank flash first, prior to reinstalling WLED. I don't think Arduino IDE completely erases all data with new flash???

After the new install, DO NOT use MQTT or attempt to set a static IP in the AP setup menu. Also, I would not write any variables to the const.h file, such as SSID, password, etc. You will need to use the AP webpage interface for settings, sans MQTT (again, don't use!!!). These two features would break my device every time. I was able to load the HA Integration for control rather than only having the webpage interface. Until there is a fix, this is what I needed to do to get a working version again. I get the impression that they are moving away from MQTT control. Good luck!

frenck commented 4 years ago

I see 2 different issues mixing up here.

The Flashing of the WLED strip after the upgrade, I've seen that happen during development once, on a single WLED light. I've not been able to reproduce it again ever since. Hence, I ignored it original development as a glitch.

The upgraded WLED integration in 0.108, uses other API calls, which are lighter and faster. How it can cause a crash, is unknown to me. Honestly, the device should not be able to crash by an API call IMHO.

Another reported issue is the timeout issues. The timeout has been lowered in 0.108, to allow faster updates of the WLED state. Looking at the reports here and upstream, I guess it makes sense to revert that part. However, the slowness of response is generally the result of a weak WiFi connection or other network issues.

Kosh42 commented 4 years ago

The individual flash at power on has always been there and I live with it. What's happened since 108 (WLED 0.9.1 beta and full) is continual flashes every 15s or so, as WLED it's constantly restarting, which also cause the lack of device availability in both HA and WLED as it drops straight off the network.

Removing the integration and restarting HA solves this, and the WLED device behaves normally and stays connected. Full control via the WLED app or device IP in a browser.

Kosh42 commented 4 years ago

@frenck - quick Q. You don't have this issues, but are your WLED devices ESP32 or 8266 based? Mine are 8266 for reference. Same for all devices.

frenck commented 4 years ago

@Kosh42 I have several devices, ESP32 & ESP8266 based, with different firmware versions 0.8.6 - 0.9.1 and master builds. To ensure compatibility.

triggerx commented 4 years ago

I'd be interested in knowing what everyone is using as their WiFi AP. I've been struggling with this problem (with WLED) for a few days now... and I've been able to confirm it only happens on my home network which is a Unifi-AP based network. Still trying to figure out where the problem lies, and how to fix... but have at least been able to isolate it to WLED 0.9.x and my Unifi network.

raoulteeuwen commented 4 years ago

I currently, for coverage (lots of concrete, not a castle size ;-)), have a mix of AP's in my house: an asus rt-ac86u, an Apple Time Capsule and a box from my ISP (zte H369A v1.00).

Using Wemos D1 mini and a NodeMCU, both have been doing pretty fine, also combined with HA.

Since 0.108 the WLED integration has not been working for me, also not after removing WLED integration/reboot/add integration again. Haven't taken the time to analyze further.

ras434 commented 4 years ago

I’m using OpenMesh access points. I have two of them that broadcast the same SSID and are part of a WiFi mesh network.

I’m having trouble with my WLED. It seemed to happen sometime after either a HA or WLED upgrade. Looked on my router and saw over 7,000 connections per minute going to my two WLED strings from HA. Disabled HA integration with WLED and that stopped all of the ESP8266 reboots.

I can control the lights from my WLED app. I’m using DHCP (with reservations). Occasionally the strips will flash green and go back to the color I have set.

Are there logs accessible from the WLED controllers?

On Wed, Apr 22, 2020 at 12:47 PM Raoul Teeuwen notifications@github.com wrote:

I currently, for coverage (lots of concrete, not a castle size ;-)), have a mix of AP's in my house: an asus rt-ac86u, an Apple Time Capsule and a box from my ISP (zte H369A v1.00).

Using Wemos D1 mini and a NodeMCU, both have been doing pretty fine, also combined with HA.

Since 0.108 the WLED integration has not been working for me, also not after removing WLED integration/reboot/add integration again. Haven't taken the time to analyze further.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/home-assistant/core/issues/33921#issuecomment-617897785, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFY3SSMFFM7SG2RHHXPDJJLRN4NTLANCNFSM4MFARAVA .

-- Rob Schmidt

frenck commented 4 years ago

7000 should be impossible. With the current setting, Home Assistant would debounce any excessive request, if not, that would be a bug. The current timer is set to run every 5 seconds, which should cause max 2 requests (which is now a lighter variant to what it was before the upgrade, it now doesn't request effects and pallets on each request anymore).

That said, I can add some additional logging to the underlying library, which can be enabled in Home Assistant to monitor what is going on.

SawKyrom commented 4 years ago

@frenck Thanks for the reply. Big fan of HA and I appreciate everything you do to keep it stable and offer us an open source home automation platform. What means does the HA integration use to communicate with WLED? Can you add custom commands/automations as is possible with MQTT?

I am still having issues with WLED connectivity. I had a stable setup for a few days, and then yesterday I activated an older NodeMCU with a pool controller program - the whole thing went crazy again causing the WLED node to drop connection, auto reset, auto on solid colors, etc. This older NodeMCU was a previous WLED ESP8266 with different IP and unchanged MAC. This device was later found to be unstable also.

I noticed that the CWMODE was either AP or both (despite me not setting this function in my sketch) on my pool controller when I discovered it broadcasting a SSID. I did not setup up the device to be in softAP and thus I presume it to be a carry over from the WLED setup. I had also completely flashed a blank bin, which must have no effect on the ExpressIf ESP8266 wifi settings. Others have mentioned this finding as well. https://github.com/Aircoookie/WLED/issues/843

It is also all but impossible for me to find the NodeMCU 8266 firmware bin from China. I want to completely restore factory settings and remove any other deep level settings initiated by WLED. Anyone what to point me in this direction, It would be much appreciated.

I added to my pool controller NodeMCU wifi setup the "WiFi.mode(WIFI_STA);" command to get the device to stop broadcasting an SSID. I'm testing this today for stability. I still don't know if this would be causing the connectivity issues, but I'm trying everything. BTW, my RSSI on this device is mid 70's.

@raoulteeuwen Are you still having issues with WLED?

I also am looking for a simple FastLED MQTT sketch to replace WLED. If anyone knows of one, please send it my way. Cheers!

SawKyrom commented 4 years ago

@Kosh42 Is your WLED setup working still?

Kosh42 commented 4 years ago

I left the integrations out for a few days to ensure the system was stable in all other ways, which it has been. Yesterday I re-added two WLED instances, and so far, so good.

raoulteeuwen commented 4 years ago

@SawKyrom after having done the "problem HA<>WLED > remove integration from HA > restart > WLED working > add WLED integration" a couple of times (before reporting here), i haven't taken the time to try again, since it feels like nothing has changed afaik, so would be repeating steps. I don't know whether specific setup details make a difference. For instance, i have a relay kind of device that from within HA i use to switch off power from both my LED and the NodeMCU's when i don't have the LEDstrips on. Maybe the fact that every night my NodeMCU's get powered up again together with the LED strips does something to the WLED-integration in HA and communication with the NodeMCU running WLED? I wonder whether people reporting they had the problem, but now it is fixed, leave their NodeMCU powered on at all times?

ras434 commented 4 years ago

@SawKyrom I wonder whether people reporting they had the problem, but now it is fixed, leave their NodeMCU powered on at all times?

I leave mine powered on all the time and only cycle the power when I have problems and I'm trying to fix them. I'm using two ESP8266 to run two different strips. I seem to also have one or both of the devices go offline fairly regularly. Since I've disabled the HA integration they haven't been flashing like they were before.

Perhaps there might be a different firmware that would work better. I just use this for under cabinet lighting in the kitchen and don't need all of the animated sequences for this application.

If there is particular information we can gather like logs, version details, dumps, etc. -- could someone let us know so we can help further with the troubleshooting?

eoncire commented 4 years ago

I'd be interested in knowing what everyone is using as their WiFi AP. I've been struggling with this problem (with WLED) for a few days now... and I've been able to confirm it only happens on my home network which is a Unifi-AP based network. Still trying to figure out where the problem lies, and how to fix... but have at least been able to isolate it to WLED 0.9.x and my Unifi network.

I can chime in here in regards to Unifi-AP possibly being a conflict. I had a board go bizerk on me this weekend. I just so happened to do it while i was installing a Unifi AP. I have a WLED nightlight in my sons room, been running WLED 0.91 for a couple months without issue (NodeMCU, 40LEDs ran off the VIN and GND). It was nap time in the middle of my Unifi install and the WiFi was offline. I unplugged the board from the USB wall wart to turn it off. Later in the evening once everything was back online I plugged the board back in and it instantly went into a boot-loop. I just sat down to see what's going on with it. I re-flashed it with 0.91 (downloaded from github) and it came up. Connected to my WiFi and it was fine, it grabbed a DHCP address. I logged into the board and set a static IP to what it was before x.x.x.51 and instantly went back to bootloop. I just removed the WLED integration for that specific board, reboot HA and it's now stable again. I have the exact same setup in my daughters room for a nightlight and it has been fine since the Unifi AP install. Maybe a oddball combo of HA and Unifi AP?? @frenck

raoulteeuwen commented 4 years ago

I don't have Unifi though, and do have the problem. So maybe there is something s well with Unifi, but also seems without. @eoncire you said you unplugged the board before the problem started. What was the normal state of that board: was it always (7*24) powered? (so could it be that between you updating from HA 0.107.x to 0.108.x, the powered was always on power)? I wonder whether powering off/on has anything to do with the problem

THATDONFC commented 4 years ago

I'd like to give an update on my situation. I immediately removed the WLED integration when I noticed this erratic behavior from my only connected node. After I removed the integration and gave HA a restart, the issue went away. I didn't have time to add the WLED integration back for a week or more. When I finally did, the issue was gone. No more boot loop, no more issues.

I recommend anyone running the WLED integration to remove and reintegrate. I don't know the success rate but I have a feeling this should solve 99% of everyone's issues. @frenck maybe an update to the docs to suggest removing and adding a fresh instance of the WLED integration is is what we need to solve this. I have no idea where the issue came from, I only know how I fixed it. And that's how I fix most of the issues I run into with HA integrations. Not that it happens often.

THATDONFC commented 4 years ago

I suspect this issue stems from pr #33608 but I cannot confirm this.

SawKyrom commented 4 years ago

It is definitely related to HA, whether for sure it is the integration, I don't know. NodeMCU wifi loops every 10 seconds with HA running. Integration removed no impact. See below:

WLED Log

Once HA is turned off, the NodeMCU with WLED connects and stays connected (stable). I reboot HA, again no integration, and WLED node remains stable. I can reboot the NodeMCU and it will connect without issue to router and remain connected. The issue appears when I attempt to connect the WLED node to HA. I can access/control the node via webpage without issue. @frenck Assuming it is integration related, any idea why the integration would be sending it in a boot/wifi connect loop? What means does HA communicate with WLED within the integration? What logger id/line do I add to Config file to get more debug information? Where can I find the WLED integration code to review? Thank you much!

I will attempt to use only the MQTT connection within HA for a day or so to determine if the issue is directly related to the HA/WLED integration or more broadly a general HA connection issue. @ras434 My NodeMCU stays powered on constantly, but I do frequently reboot HA for updates.

frenck commented 4 years ago

@frenck Assuming it is integration related, any idea why the integration would be sending it in a boot/wifi connect loop?

I'm sorry, I've been staring at this problem for hours now, and fail to see the issue.

In the end, let me be really clear on this: WLED being able to crash, because of an API call. Is NOT a Home Assistant or integration issue. An external API call should NEVER crash a device.

SawKyrom commented 4 years ago

@frenck Thank you for spending the time to investigate. I can only tell you that my issue and the data strongly suggest WLED integration involvement, specifically related to the API. I hate that you can not replicate the reported problem.

image

Supporting my statement, this behavior is also consistent with what many others have observed and reported. In all cases where a resolution was obtained, it required removal of the integration and/or re-installation. In my case, performing this action only resulted in a temporary fix. I have had a stable connection to the ESP8266 since I removed the integration yesterday. I have only been using MQTT calls for control. I personally don't think it's a coincidence, but if MQTT does eventually results in the same failure, it will certainly support your statement. As you are adamant that it is NOT related to the HA WLED Integration, I'll take that at face value and just maintain status quo for now.

Thank you again for your time and investigating this phenomenon. Also, I really appreciate the Home Assistant platform - great job. Cheers!

frenck commented 4 years ago

but if MQTT does eventually results in the same failure, it will certainly support your statement

? Those are not the same things. This is a JSON API being used for the integration.

As you are adamant that it is NOT related to the HA WLED Integration

That is not what I said, I said the crashing/boot loop is not an HA problem.

However, I'm still keen on finding the issue from the HA side of things. The number of reports shows there is, however, I fail to see it. Any help is welcome.

SawKyrom commented 4 years ago

? Those are not the same things. This is a JSON API being used for the integration.

Right, I do realize that... and that's my point. Since they are totally different, MQTT vs JSON API, failure of my current MQTT control (my current work around) would absolve the JSON API HA WLED Integration as the causative factor. If my MQTT fails, then it must NOT be API related. I hope I'm not restating your comment. I'm not using the integration right now. My test is to see how long I can maintain stable connection to this device with only MQTT control.

I apologize if I somewhere erroneously mentioned crashing/boot loop. I don't know if that's accurate terminology for by devices behavior and don't remember making that comment. Maybe I did??? The device cycles a wifi connection method every 10 seconds. The reports indicates the WLED device is recognized by the network, offered a DHCP address, connects, gets an invalid response from API, timeout occurs, disconnects from the network and starts the whole process over in a continuous cycle. I don't personally think the WLED device is rebooting or frozen. When observing serial debug, it is continuously active.

You mentioned in a previous post:

The timeout has been lowered in 0.108, to allow faster updates of the WLED state. Looking at the reports here and upstream, I guess it makes sense to revert that part. However, the slowness of response is generally the result of a weak WiFi connection or other network issues.

My device does have a marginal connection, with RSSI -70? The combination of the lowered "timeout" and weak signal maybe the issue and also why it doesn't appear to be a global bug with all users (yourself included). It very well maybe environmental in nature. Sorry! Thanks again.

SawKyrom commented 4 years ago

I can test this theory by placing a device at the router (maximum transmission fidelity) with WLED HA Integration active and see if I can replicate the flaky connection.

triggerx commented 4 years ago

FWIW.... I was able to fix my WLED bootlooping issue. I know it became intertwined with another issue (or two) in this thread.... but if anyone here is experiencing the bootloop issue with WLED 0.9.x or 0.10.0.... try disabling Alexa Emulation in the Sync Interfaces. Cleared it right up for me.... and hopefully helps others as well.

frenck commented 4 years ago

@triggerx That is an interesting conclusion. That would mean another integration interfering.

raymajor commented 4 years ago

@triggerx @frenck

Thanks, I had the same report. (285 occurrences) "Error fetching wled data: Invalid response from API: Timeout occurred while connecting to WLED device."

Now also Disable "Emulate Alexa device" See if the problem is also solved.

I'll let you know

raymajor commented 4 years ago

@frenck

For me, the turn off Alexa option does not work. 58 occurrences since last night I hope you can take this a little further.

image

image

frenck commented 4 years ago

I've just implemented some additional checks in the upstream library to handle/detect empty responses from WLED.

Furthermore, I've added retry and backoff logic. If a chip doesn't respond, for whatever reason (or spits out an empty response), it will simply try again 3 times (exponentially).

This should cover most cases where WLED is too busy or is in flux.

I'm adding some other logic to the library upstream, which will be added to the Home Assistant Core after that.

I was not able to break it after this; having Alexa and MQTT enabled, heavy effects on full speed and Home Assistant artificially modified to hammer the integration for updates ever second. This was done with an ESP8266, on WLED 0.9.1 (as 0.10 is faster and more efficient, picked the weaker one).

With all the above, I was still able to control the WLED device from both the WLED web interface and the Home Assistant interface.

frenck commented 4 years ago

In https://github.com/frenck/python-wled/pull/84, I'm adding support for a new API endpoint that WLED 0.110 introduced. Support for 0.8.4 and newer is still in place.

This will reduce the number of requests that Home Assistant fires at the WLED device by 50%. Thus reducing the risks of these issues, however, only supported by devices running 0.110.0 or newer.

frenck commented 4 years ago

Additionally: Please note, the original cause of this is mainly the Alexa stuff and can be triggered by other things as well.

If you run WLED and don't use it, disable the Alexa things. The next version of WLED (first one after 0.110.0), will disable it by default, as it caused boot loop issues...

stale[bot] commented 4 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

perseus177 commented 3 years ago

HA 2021.2.3 Logger: homeassistant.components.wled Source: helpers/update_coordinator.py:171 Integration: WLED (documentation, issues) First occurred: 10:25:46 (1 occurrences) Last logged: 10:25:46

Error fetching wled data: Invalid response from API: Timeout occurred while connecting to WLED device at 192.168.0.119