ph1p / ikea-led-obegraensad

ESP32/Arduino hack for the ikea OBEGRÄNSAD led wall lamp
MIT License
578 stars 78 forks source link

Freezes after some hours running #112

Open koturbash opened 2 months ago

koturbash commented 2 months ago

Mine always freezes, after a few hours of running, with the LEDs staying on and showing the last active thingy of whatever was running. Also, Ticking Clock just does not work, just shows dark and no LEDs light up, both when cycled with a button or web interface.

Otherwise, it all works until it freezes. The reset button on ESP32 does not help when it freezes. Super annoying when freezes and needs a power reset =(

Tested with several clean installs (build/upload) of the latest original project, over the last month. ESP32 by AZDelivery Dev Kit with CP2102, tested 3 units, same behaviour. Tested several different high-output power supplies, and do not think it's a power issue.

Any ideas about what might be the problem?

Thanks!

jaal2001 commented 2 months ago

In had a similar problem with the clock (didn’t try it with the other displays) when the wireless connection was weak: the clock just stood still.

Maybe you can check the signal strength of the esp32 in your router. That was the only hint for me, why my device was suddenly stop working.

In the end my device was connecting to an access point, which had the most distance to the device. I guess the roaming does not work (properly). So after I managed to connect the device to an AP literally in sight of the device, everything works fine now.

kohlsalem commented 2 months ago

I have to confess to have the same problem - which kills the project a bit.

For me it worked well for several weeks, but now it will not last 24h. Did yours stuck from the beginning?

How hot do your devices get? I Assume, mine was killed by the temperature...

May be a new 32 with a heat sink would do...

koturbash commented 2 months ago

Thanks for comments!

It was behaving like described from very beginning. I tried with piece of aluminum as heat sink last days, still freezes. However, worked a bit longer, a bit more than 24h

Do not think it's wifi, as it happens to all other plug-ins, that do not require Internet (I guess). But will try to test this lead also

jekkos commented 2 months ago

Having same problem here! It might be related to wifi, the AP might disappear at night. The clock seems to freeze so now and often and then I need to power cycle it. This was not the case previously

Phrunky commented 1 month ago

I had the same issue. My suspicion was an unavailable ntp server, due to connection reset through ISP over night. So I set up my router as a time server and pointed the variable "NTP_SERVER" in "constants.h" to the router. Maybe a coincident, but since then the clock didn't freeze anymore

koturbash commented 1 month ago

Thanks for your reply! Looks like that might be the source of the problem. Unfortunately can not use this solution, as it would require custom firmware for the router and not all routers can do that.

I wonder if anyone else has encountered such behaviour and has come up with a solution in the code itself?

Phrunky commented 1 month ago

A short follow up: I installed new router firmware (Fritz!Box v. 07.81) and returned to old problems. However - one thing is for sure: as soon as a little more wireless traffic occurs the clock freezes. It seems that this is reproduceable: Copy large data packages through WiFi and the clock freezes - alternatively disconnect the internet connection. So basically the firmware is very sensible to (internet) connections. To make a long story short: I think the issue is not limited to NTP-Protocol. To me it seems like a general connection issue

jaal2001 commented 1 month ago

@Phrunky Are you just using one FritzBox or also some access points for a mesh?

Phrunky commented 1 month ago

I also tried a mesh repeater at first attempts. It didn't help. I can give another try now, with new router firmware and will let you know

Phrunky commented 1 month ago

Wrong:! Easy to reproduce: Clock froze immediately - no matter if there was a mesh repeater or not

I need to update the previous statement: The clock didn#T connect to the AP with better WiFi conditions. After setting up the repeater and afterwards starting the clock, I was able to copy larger amounts of data through WiFi and the clock didn't freeze.

jaal2001 commented 1 month ago

Seems to be the same behaviour I experienced: 3 AP, one FritzBox, the LED panel near the FritzBox (see https://github.com/ph1p/ikea-led-obegraensad/issues/112#issuecomment-2068484137) . For whatever reason the ESP did connect to one of the AP, in worst case to the one with the weakest signal. I assume the wireless library does not handle wifi steering properly. I did disconnect all AP, started the ESP so it could only connect to the FritzBox. Since then I never had the problem again. The ESP always connects to the FritzBox. The clock seems to freeze, when the wifi signal is poor and the ESP can't receive a proper time signal. I assume some error handling is missing, but that is a task for someone with more code experience :-)

Phrunky commented 1 month ago

Let me know, if you share my opinion: This might be two separate issues - one having the client not to chose the best connection (which would be a general issue) and the other letting the clock freeze without a proper connection.

jaal2001 commented 1 month ago

Yes, these are 2 separate issues.

jekkos commented 1 month ago

Does the device still react to button presses during the clock freeze? Mine is completely unresponsive.. we could try to enable hardware watchdog to make it reboot automatically.

jaal2001 @.***> schreef op 30 mei 2024 18:24:24 CEST:

Yes, these are 2 separate issues.

-- Reply to this email directly or view it on GitHub: https://github.com/ph1p/ikea-led-obegraensad/issues/112#issuecomment-2140152226 You are receiving this because you commented.

Message ID: @.***>

jaal2001 commented 1 month ago

Does the device still react to button presses during the clock freeze? Mine is completely unresponsive.. we could try to enable hardware watchdog to make it reboot automatically.

The device does freeze completely and does not recover. In my opinion a boot should not be necessary, as it does not solve the problem. Instead the code should accept the missing wireless connection and just proceed. This would probably result at some point in a time drift, but in the end is more acceptable than running in a boot every few minutes when there is a bad connection ö.

jekkos commented 1 month ago

Perhaps wifI and nTP sync can be started in a background task, then set some global state in case sync succeeds. the plugins should then check this state to adapt their functionality accordingly. Currently there are some include directives that remove code in case the server option is not enabled. These directives should be replaced with a global state check for wifi and NTP sync.

jaal2001 @.***> schreef op 31 mei 2024 06:47:50 CEST:

Does the device still react to button presses during the clock freeze? Mine is completely unresponsive.. we could try to enable hardware watchdog to make it reboot automatically.

The device does freeze completely and does not recover. In my opinion a boot should not be necessary, as it does not solve the problem. Instead the code should accept the missing wireless connection and just proceed. This would probably result at some point in a time drift, but in the end is more acceptable than running in a boot every few minutes when there is a bad connection ö.

-- Reply to this email directly or view it on GitHub: https://github.com/ph1p/ikea-led-obegraensad/issues/112#issuecomment-2141225735 You are receiving this because you commented.

Message ID: @.***>

koturbash commented 1 week ago

Any updates on this one? On the one hand, it was good to hear that it wasn't only my problem, but on the other hand, it looks like it is not an easy fix. Unfortunately, my (LabVIEW) background does not allow me(read: give any success chance) to delve into the original code to fix it.

Phrunky commented 1 week ago

Rather more experiences than hard facts: I understood that the router manufacterer (AVM) confirmed a WiFi related bug/issue with the latest stable firmware release which affects my router model (7590ax). This bug has not been fixed in stable release yet. As soon as I updated to this 'buggy' firmware, frequent crashes occured on the clock. To me it seems more and more obvious that the clock behaves very sensitive to WiFi connectivity issues. Even it doesn't help, maybe someone else could try to probe or revoke my statements. I was able to reproduce crashes by copying large files through WiFi. Let us know your experiences.

jekkos commented 1 week ago

The bug only creeped in after a couple of updates. I did have quite a stable clock at some point,

It might have started after the code for WifiManager was added I could try to revive an older build to see if that resolves the issue.

It might be worth a try to revert this code basically https://github.com/ph1p/ikea-led-obegraensad/commit/72cc87f5657efba3d17b61f4f5e27c3b743b1863

luedi128 commented 1 day ago

I have the same issue, so I can reproduce the situation. I will try to fix the issue in the next days.

luedi128 commented 22 hours ago

I was able to (more or less stable) reproduce the situation with putting the esp/lamp between the router and a repeater and then turn on the microwave (that is close to the repeater). :) In parallel the esp was attached to the dev IDE and I monitored the serial output. Looks like there is a connection loss and then the device is switching to WiFi setup mode. This is causing a freeze of the clock and looks like a complete freeze of the device.

Lost connection to Wi-Fi. Reconnecting...
*wm:AutoConnect
*wm:Connecting to SAVED AP: XXXXXXX
E (644180) wifi:sta is connecting, return error
[635121][E][WiFiSTA.cpp:317] begin(): connect failed! 0x3007
*wm:connectTimeout not set, ESP waitForConnectResult...
*wm:AutoConnect: FAILED for  869 ms
*wm:StartAP with SSID:  Ikea Display Setup WiFi
*wm:AP IP address: 192.168.4.1
*wm:Starting Web Portal

Not sure yet how to fix/improve that. But maybe this information is allready helpful. If you have the freeze situation on your devices, can you please check if the WiFi-Setup mode is enabled and the SSID: Ikea Display Setup WiFi shows up in a WiFi search?

I will try some potential fixes in the next days.

jekkos commented 21 hours ago

This confirms my suspicion that the issue is due to the wifimanager changes. The setup used to work fine for days before.

luedi128 commented 10 hours ago

I was able to debug the situation without microwave :) by running around with the device attached to my laptop. That was causing connection losses and the observed bahaviour. I was now able to find a fix by changing / adding some more configuration to the wifimanager. In the main.cpp in line 82 I added these lines:

  wifiManager.setConnectRetries(10);
  wifiManager.setConnectTimeout(10);
  wifiManager.setWiFiAutoReconnect(true);

This is working fine and the device is reconnecting to my local wifi after a connection loss.

In a longer connection loss situaiton the device will still switch to the wifimanager configuration mode. If you don't want that or want to recover from that state once your local wifi is up again you can add an additional: wifiManager.setConfigPortalTimeout(180); That will after 3 minutes stop the wifimanager configuration mode and rerun a connect retry. During that time the clock will not work, so the clock is still relaing on a permanent wifi connection. An offline mode would be possible but a larger code change.

Do you want me to add this as a pull request or something?