Closed patricks closed 3 years ago
go to http://DEVICE/debug/core
in your browser. if the firmware crashed, you'll get a core dump. please send it to rojer@rojer.me, i'll take a look.
@patricks Is that an Shelly 2.5 in garage door mode?
go to
http://DEVICE/debug/core
in your browser. if the firmware crashed, you'll get a core dump. please send it to rojer@rojer.me, i'll take a look.
It rebooted a few times today (see it via the uptime in the web ui) but there is no core dump.
@patricks Is that an Shelly 2.5 in garage door mode?
No it is in the switch mode
reboot with no crash dump likely points to hardware issue, voltage instability (sags, spikes) or some such.
possibly over heat too ?
overheating just disables HAP service, it doesn't reboot the device
overheating just disables HAP service, it doesn't reboot the device
Ok good to know thanks
Hi guys! I have 4 Shelly 2.5s setup as light switches (each only as a single switch), and I have the exact same problem. I thought I had defective units, which seemed unlikely since multiple units face the same issue. It’s reassuring to know others are seeing the same problem, so hopefully we can solve it.
Some more details about my specific issue. Sometimes, the Shelly will run “stable” for many days. Sometimes, they will restart every few hours. A couple times, the Shelly seemed like it was on a restart loop, every few seconds. It was bizarre. I rolled back the stock firmware and set those affected units up on a timer (sunset till 11pm), and the lights seemed to stay on consistently. When the 2.4.0 firmware came out, I thought I would try HomeKit again but the issue persists.
Separately, I’ve also noticed that an automation in HomeKit to turn off all my lights in the evening had failed on one of the Shellys. When I try to access that Shelly by (static) IP, the web server is completely unresponsive until I power down and back up the Shelly.
Also, the WiFi signal is not strong to the Shellys that exhibit these issues, but it’s not terrible either (-63 to -72). I’m not sure if that is playing any part in the restart issue.
Please let me know if you’d like any more details, tests, or logs from me. Keep it up 👍 I love your work!
hm. naturally, i'd like to see logs. i've recently made a change to make acquiring latest logs easier, it hasn't bee released yet, so please install a beta from here - http://rojer.me/files/shelly/2.5.0-beta1/ and go to http://DEVICE/debug/log leave the page open, it will be tailing the log. when device reboots, please take last dozen lines or so from that page.
Hey @rojer I've emailed you the core dump files. I'm installing the beta FW on 2 of my Shellys (the most problematic ones). I also wanted to point out that the temperatures of the Shellys (as per the UI) are: 74°C, 95°C, 92°C, and 86°C. I will also note my electrical wiring has no N wire at the switch, so my Shellys are installed at the light end (inside the housing, so maybe that's also why its getting hot in there).
@patricks Can you share more details about your setup? Is it the standard wiring behind the light switch? How's your temp?
More notes 😊
Some more bizarre behavior. I check in the UI and see the Shelly has recently rebooted, reporting "Uptime: 0:00:00:31". A few moments later, I wanted to check if the Shelly rebooted again, but now (less than 10 minutes later), it's reporting "Uptime: 0:01:13:07". I recall this weird behavior occurring in the previous firmware as well, before I installed 2.5.0-beta1.
A little while later, one of the Shellys starting behaving possessed. Within 4 minutes, it rebooted 10 times. I used the stopwatch on my phone, and here's the duration between each reboot: 34s 6s 12s 5s 22s 1m01s 18s 10s 4s 1m10s
I've now switched them off, I'll try them again shortly. But super weird behavior. I'm wondering if anyone else has had this issue. The only other / last time this happened was about 3 weeks ago.
@rudyemm i've taken a look at the core dumps - looks like stack is smashed, i'm not getting a meaningful stack trace out of it right away, will need more digging.
Sure, let me know if I can provide any other dumps, logs, etc and I’m happy to help you experiment 👍
i've take a closer look at the dumps. the reason i'm not getting backtrace is not because of stack smashing but because the crash happens in binary libs supplied by espressif and those don't have the debug symbols necessary to find function entrypoints. anyway, the firmware enters an endless loop and gets reset by the WDT. as far as i can tell from the disassembly, it prints "mac 985" as the reason:
0x40102612: l32r a2, 0x4010214c
0x40102615: l32r a3, 0x40102150
0x40102618: movi a4, 0x3d9
0x4010261b: l32r a0, 0x4010115c
0x4010261e: callx0 a0
=> 0x40102621: j 0x40102621
0x4010214c
is the format string, %s %u
0x40102150
is "mac" and 0x3d9 is 985.
i see other people reporting it as well, but without any reason or resolution... possibly some device on your network generates some traffic that confuses the esp's wireless stack, this happened before. unfortunately, there's very little i can do, this is all in closed code.
Oh nice, thank you so much for your investigation. This makes some sense, at least I feel confident there is no malfunction with the Shelly. This could be a result of the 2 most problematic Shellys connecting to my WiFi repeater(they’re located in the garden) while the other 2 “stable” Shellys connect straight to my UniFi network inside the house and do not exhibit so much of the restart problem. I restarted my WiFi repeater, and theShellys connected to it have been stable for a full day without restarting. I will keep testing and perhaps extend my UniFi to these Shellys that keep restarting, and I’ll report back with my findings.
@patricks can you confirm how the Shellys are configured in your home network? Do you have any WiFi repeaters?
Separately @rojer do you think the temperatures I’m seeing with my Shellys of up to 95°C could be causing any stability issues, and more importantly is this normal or is it dangerous to be running that hot?
Thanks again for everyone’s help, and keep up the great work 👍👍
This sounds interesting, @rudyemm it looks like I have a very similar WiFi setup. UniFi Amplifi Router + WiFi repeater. The problematic shelly is also the only one which is connected via the repeater. I have already tried to reboot my router and then the Shelly works for a few days, but after a few days I have the same problems. The temperature on my Shellys is about 65°C
@rudyemm are your problems gone since the router restart? I have the same problems again.
Hi, at what voltage do you run your shellys. There are problems with 24 volts.
@konagar mine is on 230v
@patricks after a lot of testing, my conclusion is that a weak WiFi signal is causing the FW to crash and reboot – even after completely removing the repeater. My network setup is enterprise Unifi, not the Amplifi. I have moved my access point closer to the Shelly although the signal is still weak (they are in my garden). The web server is still slow to respond and the Shelly is still exhibiting the reboot behavior.
There is still a scenario when it falls in a reboot loop – I’m not sure what causes this. I dunno if the high temperature has any effect on this behavior. I have ordered another access point to provide better WiFi to the Shellys, and will report back with my findings of the Shellys once they have a good WiFi signal.
@rojer perhaps it’s worth testing the reliability/stability of the FW with a weak signal (RSSI: -80 or lower)?
I've been experiencing the same issue with a Shelly 2.5 I had installed. I have over 15 other Shelly 1's and 1PM's running the firmware that have been running perfectly for over 6 months now on the same network config. I purchased some more 2.5's and have just done some more testing. The 2.5 I started having this issue with is next to a 1PM behind a switch and is about 3m away from the closest Nano HD access point. I have 2 Nano HD's and I have a 2.4G SSID setup on each one which makes sure the Shellys only connect to the closest AP to them physically and don't hop between AP's. RSSI is -52 so I don't think signal strength is an issue. Reverted to stock firmware and ran for a week with no issues on the same network config. I setup a second new 2.5 yesterday on the bench and it ran for about 8hrs before dropping off. I've got some other devices on that SSID that I'm going to move off and test again, failing that I'm going to try and setup a VLAN on Unifi Controller with only the 2.5 on it and see if that has any impact.
this is interesting, there must be something to the 2.5 that causes it... but at the moment i have no idea what it could be. if someone could capture serial logs continuously from a running device, that would be great. warning: you HAVE to use an isolated serial to usb adapter, or you will regret it.
If you could point me in the direction of the right kind of serial usb adapter and how to capture the logs I'll have a crack at it. I'd love to get these working as reliably as the 1's and PM's as they're probably more reliable than some of my genuine homekit stuff ha ha.
i wish i could... i have a custom thing i use. but you need an isolated TTL level serial converter to USB
If you could point me in the direction of the right kind of serial usb adapter and how to capture the logs I'll have a crack at it. I'd love to get these working as reliably as the 1's and PM's as they're probably more reliable than some of my genuine homekit stuff ha ha.
i use this https://www.amazon.co.uk/gp/product/B07BBPX8B8/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1
@andyblac it's not isolated, never connect it while the device is connected to mains, you know what happens if you do :)
ye, i always remove all wires 1st 😄
@andyjp80 hi there 👋 great to hear your feedback. Are you facing (a) reboots only, (b) unresponsive web UI, or (c) both? It’s good to have more people involved in this, hopefully we can get to the bottom of what is going on. It’s also reassuring to hear from you that your Shelly 1 devices are operating normally.
@rudyemm the core dump from my unit pointed to the "mac 985" issue. When it happens it has "no response" in homekit and it disappears from my network and web ui isnt accessible. If I power the circuit off at the breaker in the switch board it comes back to normal operation. Yesterday I moved the 2.5 onto its own 2.4G SSID and VLAN on a different IP range (it's the only device on it) to the rest of my network and so far its been up for 24hrs without issue. I'll report back in a few days to see if it's still going.
@rojer something like this? https://www.amazon.com.au/KNACRO-Isolated-Serial-Module-Fully/dp/B07L2MT5QJ
yes, that's more like it
i plan to update Espressif SDK to latest version in the near future, hopefully this will help or at least give us an opportunity to report issues to Espressif as we'll be running their latest and greatest.
Hi, I add this comment from #310.
I have installed 3 Shelly 2.5 to control the roller shutters at home. 2 of them are working without problems, but the third one only works for a couple of days and then it loses the connection to the Wi-Fi (and HomeKit of course), with no reboots as far as I know (I will check logs next time). The physical switches are still working without problems, but the only solution to recover the device is to cut off the power to force a restart. Then, it reconnects to the Wi-Fi immediately (I will try to restart the router next time).
All devices are in the same Wi-Fi (no repeaters) and running firmware 2.6.1. Temperature is around 50 ºC in all devices. And Wi-Fi signal is -70, -76 for the devices working fine and -83 for the one with intermittent failures (so, this is the highest difference between them).
Quick update - have had the 2.5 on its own SSID (tied to a single NanoHD AP) and VLAN for 2 days now and is still working well - as mentioned above if I had it on my main network with the the Shelly's and a couple of other devices (Ring Doorbell and Garage Door Opener) it would drop out after 3-12hrs consistently and would need to cut power then it would become responsive again for another 3-12hrs. Seems to be a viable workaround for now. Now I'm going to add 2 more 2.5's to the VLAN and see if they all stay stable.
Spoke too soon - it just rebooted itself and turned the lights on. Back to the drawing board. At least its still responsive without a hard power off I guess..
Damn, but at least we’re narrowing it down 🤣 what’s the WiFi signal strength for the Shellys? I’ve anecdotally noticed (roughly) that the worse the signal, the more often the reboot.
thanks everyone for your efforts, i'm watching this closely. i am also working on Espressif SDK update, that will hopefully fix this. i'll let you know when i have something to test.
Hi, I add this comment from #310.
I have installed 3 Shelly 2.5 to control the roller shutters at home. 2 of them are working without problems, but the third one only works for a couple of days and then it loses the connection to the Wi-Fi (and HomeKit of course), with no reboots as far as I know (I will check logs next time). The physical switches are still working without problems, but the only solution to recover the device is to cut off the power to force a restart. Then, it reconnects to the Wi-Fi immediately (I will try to restart the router next time).
All devices are in the same Wi-Fi (no repeaters) and running firmware 2.6.1. Temperature is around 50 ºC in all devices. And Wi-Fi signal is -70, -76 for the devices working fine and -83 for the one with intermittent failures (so, this is the highest difference between them).
It happened again a few days ago. I restarted the router and all the devices reconnected fine except that one. I cut off its power to force the restart and I moved the device a little to improve the Wi-Fi signal (now it is -72 instead of -83). It has been working fine for 5 days, so it seems that my main problem is fixed. If it doesn't lose the connection everything is right...
Hi @andyjp80 👋 how was your experience so far? Were you able to get the serial adapter to retrieve logs of the crash/reboot? For me, the 2.7 beta still exhibits the same crash/reboot behavior.
please test 2.7 beta and let me know your experience - https://github.com/mongoose-os-apps/shelly-homekit/issues/330
@rudyemm thanks for providing the dumps. by the looks of it, something related to dns-sd is leaking memory - heap autopsy shows a lot of active allocations with dns-sd advertising data. will keep looking.
some more investigating today: it's a connection leak, related to dns-sd. somehow connections get left behind... will continue.
ok, it's not a connection leak but a leak of pending pbufs when closing UDP connections. i think i've found the reason, @rudyemm please update to beta3 and let me know if it helps.
I'm still facing crashes/reboots with beta 3 – how's the rest of your guys' experience?
I've updated to the stable 2.7.0 and will report back any new findings. I assume not much has changed from beta to stable that addresses this topic @rojer ?
Maybe if we can perform a full, clean, wipe of the device #308 we can determine if the issues we're facing are a result of a configuration issue
@rudyemm 2.7.0 was getting stale on the cooker, and i did fix a couple issues that should improve things, so i decided to push it out. 2.7.0 is just a rebuild of the same code as beta3, so no change is expected for you. i understand that you are still facing issues, and we will continue to investigate them. i think next step is to enable remote logging to my server and see if anything comes along that way. i will give you instructions on how to do it soon.
@rudyemm please use the following url:
http://shelly25-test.local/rpc/Config.Set?config=%7b%22debug%22%3a%7b%22udp_log_addr%22%3a%2235.205.201.239:13001%22%7d%7d&reboot=true
replace shelly25-test.local
with the names of devices that experiences issues.
this will send logs to my server so i can hopefully see what's wrong.
please give me device IDs (available in the system section of the web ui) so i can know which is which,
Happy New Year everyone 🥳
I hope the remote logs have provided valuable info @rojer – were you able to conclude any issues?
2.7 seems to have done the trick for me.. have been online for 9 days now with no issues.. longest it lasted before was a few days. Looking good. Thanks!
@rudyemm i see strange behavior by shellyswitch25-1A4A17 does it have a core dump?
Hi,
I have currently 3 Shelly's (2.5) installed for a few months now. One of them reboots every x-minutes. I figured this out because the light turns off for a few seconds and the uptime in the web ui is also reseted. Is there a way to figure out whats going wrong? I tried to enable debug logging, but it looks like after a reboot the log gets cleared?
I am always running the latest firmware (currently 2.4.0), but this problem also appeared with older ones.