Closed FSorrow closed 4 years ago
could be either of those. we'll need more details.
Hi, so...2 of the shellys are in the same room of the router (tp link MR200 - i don't have a copper/fiber connection available)...and this is a rural area so i don't have congestions in 2,4G frequencies. In this setup i use an Atv as HUB with a static IP, an ikea Gateway in DCHP with ip reservation, 2 yeelights strip connected in the same way of the ikea gateway and some iphone / macs / ipads's trough 5ghz all in DHCP mode. The only thing that i did during the setup of the shellys was to use static IP address out of the range of DHCP server. Now the issue is that lot of times during the day on the iPhone home app shelly devices appears with "no response" label. [a weird thing is that this label is different in different iPhones...eg....in my iPhone shelly 1 appears online in my wife's iPhone no...while both iPhone are connected on the same wifi network]. I do some investigation and notice that while "pinging" those devices (shellys) i loss some packets...this make me think that the issue is in the network ...but...where ?
thank you so much ..
can you measure / track the signal strength of the Shelly, the antenna is on the bottom of the device try pointing the bottom of the shelly upwards, that has helped mine a lot. As the shelly's only have a small antenna.
where i can verify the signal strength on shellys with mongoose fimware? I'm doing some tests...i discover that when devices goes offline in Hk they remain reachable via web page....so i suppose that my problem has to be found between Hk and shelly devices.
no, in your router gui.
Sorry. My router only show packets for wireless devices.
new update. In this moment shelly 1 is pingable with some packets lost and it isn't reachable by web interface and homekit. (i've just perform an homekit settings reset).
you have some long ping times there, looks to me as wifi issues, here's mine
PING 192.168.3.22 (192.168.3.22): 56 data bytes
64 bytes from 192.168.3.22: icmp_seq=0 ttl=255 time=25.630 ms
64 bytes from 192.168.3.22: icmp_seq=1 ttl=255 time=12.580 ms
64 bytes from 192.168.3.22: icmp_seq=2 ttl=255 time=2.180 ms
64 bytes from 192.168.3.22: icmp_seq=3 ttl=255 time=2.361 ms
64 bytes from 192.168.3.22: icmp_seq=4 ttl=255 time=14.459 ms
64 bytes from 192.168.3.22: icmp_seq=5 ttl=255 time=13.910 ms
64 bytes from 192.168.3.22: icmp_seq=6 ttl=255 time=2.373 ms
64 bytes from 192.168.3.22: icmp_seq=7 ttl=255 time=2.357 ms
64 bytes from 192.168.3.22: icmp_seq=8 ttl=255 time=2.039 ms
64 bytes from 192.168.3.22: icmp_seq=9 ttl=255 time=4.615 ms
64 bytes from 192.168.3.22: icmp_seq=10 ttl=255 time=14.150 ms
64 bytes from 192.168.3.22: icmp_seq=11 ttl=255 time=14.012 ms
--- 192.168.3.22 ping statistics ---
12 packets transmitted, 12 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 2.039/9.222/25.630/7.307 ms
PING 192.168.3.17 (192.168.3.17): 56 data bytes
64 bytes from 192.168.3.17: icmp_seq=0 ttl=128 time=2.581 ms
64 bytes from 192.168.3.17: icmp_seq=1 ttl=128 time=12.543 ms
64 bytes from 192.168.3.17: icmp_seq=2 ttl=128 time=5.637 ms
64 bytes from 192.168.3.17: icmp_seq=3 ttl=128 time=8.542 ms
64 bytes from 192.168.3.17: icmp_seq=4 ttl=128 time=3.645 ms
64 bytes from 192.168.3.17: icmp_seq=5 ttl=128 time=5.723 ms
64 bytes from 192.168.3.17: icmp_seq=6 ttl=128 time=2.761 ms
64 bytes from 192.168.3.17: icmp_seq=7 ttl=128 time=2.938 ms
64 bytes from 192.168.3.17: icmp_seq=8 ttl=128 time=3.714 ms
64 bytes from 192.168.3.17: icmp_seq=9 ttl=128 time=5.811 ms
64 bytes from 192.168.3.17: icmp_seq=10 ttl=128 time=6.845 ms
64 bytes from 192.168.3.17: icmp_seq=11 ttl=128 time=8.814 ms
64 bytes from 192.168.3.17: icmp_seq=12 ttl=128 time=2.548 ms
64 bytes from 192.168.3.17: icmp_seq=13 ttl=128 time=6.619 ms
64 bytes from 192.168.3.17: icmp_seq=14 ttl=128 time=2.355 ms
64 bytes from 192.168.3.17: icmp_seq=15 ttl=128 time=2.919 ms
64 bytes from 192.168.3.17: icmp_seq=16 ttl=128 time=7.990 ms
--- 192.168.3.17 ping statistics ---
17 packets transmitted, 17 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 2.355/5.411/12.543/2.804 ms
Uhm....is that i was looking for...seems too long time and with loss.
64 bytes from 192.168.1.51: icmp_seq=29 ttl=128 time=16.032 ms 64 bytes from 192.168.1.51: icmp_seq=30 ttl=128 time=5.058 ms 64 bytes from 192.168.1.51: icmp_seq=31 ttl=128 time=81.119 ms 64 bytes from 192.168.1.51: icmp_seq=32 ttl=128 time=12.107 ms 64 bytes from 192.168.1.51: icmp_seq=33 ttl=128 time=8.131 ms 64 bytes from 192.168.1.51: icmp_seq=34 ttl=128 time=8.564 ms 64 bytes from 192.168.1.51: icmp_seq=35 ttl=128 time=16.095 ms 64 bytes from 192.168.1.51: icmp_seq=36 ttl=128 time=4.197 ms 64 bytes from 192.168.1.51: icmp_seq=37 ttl=128 time=124.035 ms 64 bytes from 192.168.1.51: icmp_seq=38 ttl=128 time=3.451 ms 64 bytes from 192.168.1.51: icmp_seq=39 ttl=128 time=103.719 ms 64 bytes from 192.168.1.51: icmp_seq=40 ttl=128 time=10.945 ms 64 bytes from 192.168.1.51: icmp_seq=41 ttl=128 time=4.144 ms 64 bytes from 192.168.1.51: icmp_seq=42 ttl=128 time=7.062 ms 64 bytes from 192.168.1.51: icmp_seq=43 ttl=128 time=12.202 ms 64 bytes from 192.168.1.51: icmp_seq=44 ttl=128 time=13.681 ms 64 bytes from 192.168.1.51: icmp_seq=45 ttl=128 time=11.072 ms 64 bytes from 192.168.1.51: icmp_seq=46 ttl=128 time=14.965 ms 64 bytes from 192.168.1.51: icmp_seq=47 ttl=128 time=10.995 ms 64 bytes from 192.168.1.51: icmp_seq=48 ttl=128 time=49.387 ms 64 bytes from 192.168.1.51: icmp_seq=49 ttl=128 time=11.314 ms
--- 192.168.1.51 ping statistics --- 50 packets transmitted, 49 packets received, 2.0% packet loss round-trip min/avg/max/stddev = 2.123/28.810/209.645/45.255 ms
maybe @rojer could try updating to latest libraries for the chipset, not sure what else I could suggest.
just tryed to change my wifi channel...for this particular device situation seems to be a little bit better....i don't know if i was only lucky PING 192.168.1.51 (192.168.1.51): 56 data bytes 64 bytes from 192.168.1.51: icmp_seq=0 ttl=128 time=10.841 ms 64 bytes from 192.168.1.51: icmp_seq=1 ttl=128 time=2.486 ms 64 bytes from 192.168.1.51: icmp_seq=2 ttl=128 time=5.885 ms 64 bytes from 192.168.1.51: icmp_seq=3 ttl=128 time=5.756 ms 64 bytes from 192.168.1.51: icmp_seq=4 ttl=128 time=11.167 ms 64 bytes from 192.168.1.51: icmp_seq=5 ttl=128 time=10.338 ms 64 bytes from 192.168.1.51: icmp_seq=6 ttl=128 time=4.687 ms 64 bytes from 192.168.1.51: icmp_seq=7 ttl=128 time=8.127 ms 64 bytes from 192.168.1.51: icmp_seq=8 ttl=128 time=12.653 ms 64 bytes from 192.168.1.51: icmp_seq=9 ttl=128 time=5.486 ms 64 bytes from 192.168.1.51: icmp_seq=10 ttl=128 time=10.180 ms 64 bytes from 192.168.1.51: icmp_seq=11 ttl=128 time=5.996 ms 64 bytes from 192.168.1.51: icmp_seq=12 ttl=128 time=120.933 ms 64 bytes from 192.168.1.51: icmp_seq=13 ttl=128 time=50.036 ms 64 bytes from 192.168.1.51: icmp_seq=14 ttl=128 time=8.616 ms 64 bytes from 192.168.1.51: icmp_seq=15 ttl=128 time=7.829 ms 64 bytes from 192.168.1.51: icmp_seq=16 ttl=128 time=11.415 ms 64 bytes from 192.168.1.51: icmp_seq=17 ttl=128 time=4.903 ms 64 bytes from 192.168.1.51: icmp_seq=18 ttl=128 time=4.797 ms 64 bytes from 192.168.1.51: icmp_seq=19 ttl=128 time=10.635 ms 64 bytes from 192.168.1.51: icmp_seq=20 ttl=128 time=6.610 ms 64 bytes from 192.168.1.51: icmp_seq=21 ttl=128 time=6.763 ms 64 bytes from 192.168.1.51: icmp_seq=22 ttl=128 time=4.445 ms 64 bytes from 192.168.1.51: icmp_seq=23 ttl=128 time=8.259 ms 64 bytes from 192.168.1.51: icmp_seq=24 ttl=128 time=1.886 ms 64 bytes from 192.168.1.51: icmp_seq=25 ttl=128 time=2.927 ms 64 bytes from 192.168.1.51: icmp_seq=26 ttl=128 time=5.779 ms 64 bytes from 192.168.1.51: icmp_seq=27 ttl=128 time=54.327 ms 64 bytes from 192.168.1.51: icmp_seq=28 ttl=128 time=6.699 ms 64 bytes from 192.168.1.51: icmp_seq=29 ttl=128 time=236.993 ms 64 bytes from 192.168.1.51: icmp_seq=30 ttl=128 time=2.077 ms 64 bytes from 192.168.1.51: icmp_seq=31 ttl=128 time=8.590 ms 64 bytes from 192.168.1.51: icmp_seq=32 ttl=128 time=4.923 ms 64 bytes from 192.168.1.51: icmp_seq=33 ttl=128 time=9.538 ms 64 bytes from 192.168.1.51: icmp_seq=34 ttl=128 time=2.215 ms 64 bytes from 192.168.1.51: icmp_seq=35 ttl=128 time=4.933 ms 64 bytes from 192.168.1.51: icmp_seq=36 ttl=128 time=10.651 ms 64 bytes from 192.168.1.51: icmp_seq=37 ttl=128 time=8.884 ms 64 bytes from 192.168.1.51: icmp_seq=38 ttl=128 time=2.068 ms 64 bytes from 192.168.1.51: icmp_seq=39 ttl=128 time=5.491 ms 64 bytes from 192.168.1.51: icmp_seq=40 ttl=128 time=6.071 ms 64 bytes from 192.168.1.51: icmp_seq=41 ttl=128 time=5.229 ms 64 bytes from 192.168.1.51: icmp_seq=42 ttl=128 time=2.612 ms 64 bytes from 192.168.1.51: icmp_seq=43 ttl=128 time=85.510 ms 64 bytes from 192.168.1.51: icmp_seq=44 ttl=128 time=156.390 ms 64 bytes from 192.168.1.51: icmp_seq=45 ttl=128 time=5.569 ms 64 bytes from 192.168.1.51: icmp_seq=46 ttl=128 time=7.327 ms 64 bytes from 192.168.1.51: icmp_seq=47 ttl=128 time=2.609 ms 64 bytes from 192.168.1.51: icmp_seq=48 ttl=128 time=5.219 ms 64 bytes from 192.168.1.51: icmp_seq=49 ttl=128 time=4.239 ms
--- 192.168.1.51 ping statistics --- 50 packets transmitted, 50 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 1.886/19.752/236.993/42.591 ms
However all device have the same problem (as see in these pings....every 14-15 pings latency splash up). Still a wifi network issue ?
Hi guys. I've just perform an upgrade to my network for testing purpose: Added a cisco wag320n as 2.4 AP.... fast time response but still same issue: some packets were lost. Still have long time in homekit to obtain status information from accessories(updating label) or no response. So if my network is OK...why i don't have a stable communication with my shellys ?
thank you for helping.
hi, sorry for my "hard-posting" (is a real word??) LOL. However investigations goes on. I've made a downgrade of my shelly 1 to version 1.7.0 and nothing in changed...but i discover that in my router if a device is connected trough DHCP server i can control its status into traffic monitor page. Now i can confirm that those little bastards connecting and disconnecting several times during the day. So I re-upload original firmare and.....with the original firmware the same device in the same place etc, etc....never lose WiFi connection (WiFi signal is between -40 an -50dB so perfect) For future releases is it possibile to show in web page the signal strength ? Seems similar to another issue opened. Take note of that, for your experience in order to have a stable situation i'd better to wait future releases of this firmware or reverting to stock and going through hoobs ? thanks
thanks for the info. this is the part where we have to put our faith into SDK update, as connectivity is fully in the binary part, we don't do much apart from just saying "connect". it may be a regression in the upstream SDK, original firmware uses a pretty old version of mos (and hence upstream sdk). nevertheless, thank you for reporting, once i update the SDK i will ask you to test new firmware.
If could help for solving this issue I notice that connection fall down (in most cases) when I open home app. Seems like when home is requesting status from devices they lose connection.
oh, interesting. this could be due to missing beacons while the device is doing handshake crypto. this is very useful, thanks. as a workaround, see if you can increase beacon interval in your AP.
so, with higher beacon interval (default 100ms) the devices flashed with this firmware stopped connect/disconnect cycle. However they continue to appear offline in home app.
try rebooting ios device or redoing the homekit setup. i seem to recall that i had apple device "give up" on a "flaky" device once. use the HAP reset button in web ui and re-add it to home.
Guys, I really don't know what to think. 2 Days running without issues. No changes in network or other. Still waiting and investigating.
First of all you are doing great job! It only needs a little bit of tweaking. That being said I have similar problem as Fsorrow. It was working great with shelly stock fw + MQTT + HA with homekit bridge. With this FW they all (I have 10 Shelly1) randomly disconnect and I have to reboot them to come online and accessible through home app again. When they disconnect automation also stop working (like motion sensor automation through home app). Response time is also much slower than it was with MQTT + HA. When I pressed a button on my Hue switch it was almost instant with MQTT + HA, now it takes a second or even more for light to come on. It would be great if you found the problems and fixed it :) Thanks!
I guess I found out the solution. I use UNIFI APs and DTIM for my IOT network was set to default (whatever that is). I’ve manually set it to 1 and it works MUCH better now. There is still slight delay but no random drops or disconnects. I hope this will help someone :)
thanks for reporting. the root cause of the problem is understood, but unfortunately solution is not going to be simple. the problem is that the cpu-intensive cryptio used during homekit handshake hogs CPU for extended periods of time and doesn't allow the wireless stack to do what's necessary to maintain association with AP. unlike all other ports, which use FreeRTOS, the esp8266 port of Mongoose OS does not use preemptive task management core (due to memory constraints). i've heard that RTOS SDK for esp8266 has seen a lot of work and is in much better shape now, but switching to it is not going to be easy. i think the time has come to try to do it, but it's not going to happen very soon i'm afraid.
Thank you @rojer for your work. I'm actually running out of issues with pi running hoobs with shelly plugin. Hope you fix this issue for setups without bridges.
thanks for reporting. the root cause of the problem is understood, but unfortunately solution is not going to be simple. the problem is that the cpu-intensive cryptio used during homekit handshake hogs CPU for extended periods of time and doesn't allow the wireless stack to do what's necessary to maintain association with AP. unlike all other ports, which use FreeRTOS, the esp8266 port of Mongoose OS does not use preemptive task management core (due to memory constraints). i've heard that RTOS SDK for esp8266 has seen a lot of work and is in much better shape now, but switching to it is not going to be easy. i think the time has come to try to do it, but it's not going to happen very soon i'm afraid.
Is there a plan to move it from Mongoose OS to RTOS SDK for esp8266 ?
there is definitely a plan, and it is to rebase Mongoose OS onto RTOS SDK instead of non-OS. but it's a big change, so unfortunately still no ETA.
So now there is no way how to repair the issue with randomly disconnected shelly devices?
unfortunately, i have no short-term solution to suggest.
ok, happy that i finally found the issue for my Shelly's not working reliably. When connected they work very well and light goes on and off within milliseconds. However, when coming home after a few days. They take minutes after first switch press to reach a repsonsive state again.
The solution that works reliably is to setup a reasberry pi with https://github.com/homebridge/homebridge/ and install https://github.com/alexryd/homebridge-shelly on it. Keeping the stock firmware on the devices.
However, I strongly would prefer to not have a raspberry pi involved in switching on the light... Thanks alot for this great project!
Very much looking forward to test this and to provide feedback.
Hi,
Today I've also noticed that I'm experiencing these timeouts periodically on all my shelly devices, which I think is leading Homekit to report the shellys as unresponsive.
On the latest stock firmware, I can report that I don't have any of these issues so it isn't a wifi/device problem.
I prefer to be on mongoose and get rid of the demand on Homebridge. Any help would be appreciated.
@Marfre888 can you confirm you are using the latest firmware (2.0.4)? you can check the version on the web ui. if not, please update as there have been changes that should improve connection stability.
@rojer Yes I'm on the latest. (2.0.4)
The ping test, for the most part, runs fine and the pings are no higher than 10ms. For some reason every 1-3minutes I get 1-3 pings that timeout. The ping before the one that times out is usually much higher (~100ms)
Like I said before, or stock firmware this doesn't happen.
Edit: Looked through the router logs, no signs that it disconnected/reauthorized, leading me to believe that the issued might not be caused by wifi but something that is interrupting communication like the shelly going to sleep
I have upgraded to 2.0.4 as well and still my Shelly 2.5 does not react very frequently. My Shellys are used in an automation E.g. I have one shelly behind a switch and another one in the ceiling and they are linked by automations in the Apple Home app.
I have discovered that opening the home app on my iPhone in nearly all cases make the switches work. In the home app I can pretty much always use the switch. I just have the reliability problem on the physcial switch.
As a result I belive maybe I have another issue than network connectivity?
Pinging the Shelly (sample size 800) i got 7 ping timeouts.
In my case, the time outs might not even be the cause of the problem at all.
Cause if I constantly keep turning the light on and off the light will answer accordingly. However, as soon as I let it idle for a couple of minutes, and go to trigger it via a motion sensor it won't turn on. If I then try to turn it on manually via the Home app, it will take a couple of seconds to respond but eventually, turn on. However, I do get a '!' top right of the tile. From then on, it will behave.
I have the same case as @Marfre888, running on firmware 2.0.4.
If the motion sensor is triggered it takes almost 3-5 seconds to open the light via Shelly or in other cases it just doesn’t work at all. If I use the Shelly from Home App and then trigger the motion sensor it works better. Working with the wall switch has no delays or change in behavior from stock firmware.
FYI: I have more Shelly switches in the house running stock + HOOBS and they work a lot better and more reliably.
can you deploy this debug firmware to a couple devices on your network? i really need an "inside look" on one of these networks.
Hi @rojer I have installed your new firmware on both of my switches.
I might have the number of devices problem as well, here are:
@rojer Sure, it would be my pleasure. I would really be thankful if you could find the solution to this and fix it cause currently, the shellies are barely usable for me in this state.
I would run stock for the time being but, homebridge shelly doesn't discover devices on a mesh network.
So I will set up two shellies with this firmware. The first one is called 'Stairs Light', that is on the mesh network permanently. It is connected to a motion sensor that turns on when light levels are low (around 8pm)
The other one is called 'Wall Unit Light'. It isn't on any timer or motion sensor but still exhibits the same problem. It is also directly connected to the main router, not a mesh node.
I think that gives you as much diversity as possible. Let me know if these were helpful.
EDIT: I reverted 'Stairs Light' back to 2.0.4 since the compiled firmware seems to be for a Shelly 1, not a 1PM and as a result the shelly stopped responding. The other one is still running the firmware for the time being, in case it might be helpful.
please test firmware posted here - https://github.com/mongoose-os-apps/shelly-homekit/issues/30#issuecomment-662669722
Hi @rojer , thanks for the update. I have installed the debug version on both of my switches. One did somehow not come back after the firmware update. But after cutting the power to it, the switch booted successfully again.
I could use the switch without the home app open just now, which was never possible recently. I will report back with findings the next days.
Now running 20200723-232504/2.0.6-2-g51ca9d1-debug
on shellyswitch25-686DF3
and shellyswitch25-691739
.
@rojer happy to report back that 2.0.7 seems to be a significant step forward. It works very well in my environment. I have switched my light on and off many times now -- it works always. The automations are triggerd succesfully. I would consider this issue solved for me.
One thing that would be intersting for me is, which IPs currently have active connections. I see four on one of my switches and wondering why the other one only has three ...
@fmms that's great to hear! connection management by apple devices is a bit of a mystery for me too. opening home app seems to trigger reconnection reliably. firmware upgrade does too (because configuration number changes, i think). otherwise they seem top be ok with connection disappearing and do not actively try to reconnect, which may explain why notifications get lost. thus, the only reliable way to keep things connected is to not drop connections in the first place.
i think i can get a couple connections by optimizing output buffer management, i'll get to it soon. we should be able to get the required 8 and maybe 9 or 10, we'll see.
One thing that would be intersting for me is, which IPs currently have active connections.
i will add a debug page to show internal stats like active connections in the future.
I have just updated to 2.0.8 the /debug/ works great. The output I get is:
Config number: 5
HAP connections:
192.168.178.49:62343 last_io 1595764067
192.168.178.27:65009 last_io 1595764084
192.168.178.51:49159 last_io 1595764088
192.168.178.45:49157 last_io 1595764085
192.168.178.52:54597 last_io 1595764085
192.168.178.39:63666 last_io 1595764085
Total: 6
To really make any sense of the output I had to do ping -a
on all IP addresse to figure out what devices they are ...
In addition I had to convert the unix time stamps to readable time information.
It would be perfect if the output would be like:
Config number: 5
HAP connections:
homepod-1 (192.168.178.49:62343) last seen 07/26/2020 11:47am UTC (5 min ago)
...
Total: 6
@fmms last_io is actually a difference between now() and last_io timestamp. the reason it's so big is that time is st by SNTP and that is not handled properly. ip -> hostanme lookup is not actually easy to do, i think my time would be better spent elsewhere. that said, i am adding some more debug info to help understand internal state of connections.
i'm going to close this one in favor of #30
Hi everybody, i have 2 shelly 2.5 and 1 shelly 1 in my setup running version 1.7.1 All of them randomly are non reachable from homekit, tryed to ping and sometimes they loss packets. Is this issue due to my network setup or it could be caused by mongoose firmware ?
sorry for my bad english. Thank you