Closed mahikeulbody closed 1 year ago
Which MQTT Broker are you using? Also, can you define "many" ... is it several per minute, or every few minutes or ... ?
Mosquito (hass addon) I got these messages 5 times in the last hour.
These messages stopped this night. Now I have this kind of messages (I replaced my user by xxx)
2023-01-21 06:57:05: Client nukihub disconnected: Success. 2023-01-21 06:57:05: New connection from 192.168.0.227:58182 on port 1883. 2023-01-21 06:57:05: New client connected from 192.168.0.227:58182 as nukihub (p2, c1, k60, u'xxx')
2023-01-21 07:01:16: Client nukihub disconnected due to protocol error. 2023-01-21 07:01:16: New connection from 192.168.0.227:58183 on port 1883. 2023-01-21 07:01:16: New client connected from 192.168.0.227:58183 as nukihub (p2, c1, k60, u'xxx').
2023-01-21 07:09:55: Client nukihub disconnected. 2023-01-21 07:09:55: New connection from 192.168.0.227:58184 on port 1883. 2023-01-21 07:09:55: New client connected from 192.168.0.227:58184 as nukihub (p2, c1, k60, u'xxx').
2023-01-21 07:20:59: Client nukihub disconnected due to protocol error. 2023-01-21 07:20:59: New connection from 192.168.0.227:58185 on port 1883. 2023-01-21 07:20:59: New client connected from 192.168.0.227:58185 as nukihub (p2, c1, k60, u'xxx').
I didn't restart nuki hub by myself.
Edit : finally not, sorry 2023-01-21 08:17:56: New connection from 192.168.0.227:53433 on port 1883. 2023-01-21 08:17:56: Client nukihub already connected, closing old connection. 2023-01-21 08:17:56: New client connected from 192.168.0.227:53433 as nukihub (p2, c1, k60, u'xxx')
I too have the same messages (I replaced my user by xxx): 2023-01-21 05:45:15: Client nukihub disconnected due to malformed packet. 2023-01-21 05:45:15: New connection from 192.168.1.84:51856 on port 1883. 2023-01-21 05:45:15: New client connected from 192.168.1.84:51856 as nukihub (p2, c1, k15, u'xxx').
They are not the same messages, yours are about "malformed packets". I have no more "malformed packets" since I am using nuki hub v7. Are you using v7 ?
For me, the malformed packets are gone, but there seems to be an issue with reconnects. Besides just reconnecting often, I now also see these errors every few hours (looks like rate limiting by the broker):
2023-01-21 07:15:35: Bad client <nukihub> sending multiple CONNECT messages.
2023-01-21 07:15:35: Client <nukihub> disconnected due to protocol error.
Me too
2023-01-21 10:42:12: Bad client nukihub sending multiple CONNECT messages.
2023-01-21 10:42:12: Client nukihub disconnected due to protocol error.
They are not the same messages, yours are about "malformed packets". I have no more "malformed packets" since I am using nuki hub v7. Are you using v7 ?
I'm stuck at 6.11, since I'm leaving for a week's vacation, I don't feel like updating and leaving.
I'm stuck at 6.11, since I'm leaving for a week's vacation, I don't feel like updating and leaving.
That's fine, but then your log is not relevant to this issue.
I'm stuck at 6.11, since I'm leaving for a week's vacation, I don't feel like updating and leaving.
You should upgrade, v7 is much better because it solves the malformed topics issue that causes reboots. Upgrade asap.
I am logging the nuki/maintenance/log
topic and I see several apparent reboots, but also lots of MQTT reconnects that are not related to reboots during the night (i.e. several MQTT connected.
messages with nothing else in between, spaced over an hour maybe).
I am logging the
nuki/maintenance/log
topic and I see several apparent reboots, but also lots of MQTT reconnects that are not related to reboots during the night (i.e. severalMQTT connected.
messages with nothing else in between, spaced over an hour maybe).
You can log your uptime, it's easiest to track the duration between reboots.
mqtt:
sensor:
- name: "Nuki Entrée Uptime"
state_topic: "nuki_entree/maintenance/uptime"
device_class: duration
unit_of_measurement: min
I have a lot of reboots without restart messages so I don't know the root cause. Scheduled restart is disabled on both ESP. I will see if not sending RSSI + QoS=1 + CS=False from https://github.com/technyon/nuki_hub/issues/51#issuecomment-1400664596 helps.
Thanks, yeah, the reboot messages might also simply be lost if the MQTT connection is not reestablished in a timely manner.
I'm using the 7.0-cs-false
fw, I don't have reboots, luckily:
This is my config:
I'm using the
7.0-cs-false
fw, I don't have reboots, luckily:
Argh, I have this behavior with 7.0-cs-false
. :-/
I will test 7.1 with RSSI disabled before testing anything related to CS / QoS.
Strange that we observe totally different results. I'm starting to question the broker...you both use Mosquitto right?
Actually, I don't any more. But I've no adapted my restart settings to yours, so we can compare tomorrow.
Actually, I don't any more. But I've no adapted my restart settings to yours, so we can compare tomorrow.
I was replying to Julien actually. Yes I know you're using EMQX now, welcome onboard. :)
Did you see reconnection improvements since you switched to EMQX?
Ah, I thought you meant as both (because, well "both"). No, the reconnect behavior has not improved. Here's the most current log. The first MQTT connected
at 19:13 is after the last reboot (me changing the settings). Everything else MQTT reconnects where the ESP did not reboot (and some buffer corruptions).
2023-01-23 19:13:27.652456+01:00: MQTT connected
2023-01-23 19:13:58.478196+01:00: 1 nuki/lock/r
2023-01-23 19:13:58.673789+01:00: MQTT connected
2023-01-23 19:22:58.017850+01:00: MQTT connected
2023-01-23 19:31:03.502848+01:00: MQTT connected
2023-01-23 19:31:33.868528+01:00: 1 nuki/lock/r
2023-01-23 19:31:34.019050+01:00: MQTT connected
2023-01-23 19:32:04.152625+01:00: -59T connected
2023-01-23 19:32:04.332708+01:00: MQTT connected
2023-01-23 19:32:34.490780+01:00: 1 nuki/maner/
2023-01-23 19:32:34.761026+01:00: MQTT connected
2023-01-23 19:34:26.566261+01:00: MQTT connected
2023-01-23 19:35:12.852033+01:00: MQTT connected
2023-01-23 19:41:37.360543+01:00: MQTT connected
2023-01-23 19:41:39.408436+01:00: MQTT connected
2023-01-23 19:46:57.570165+01:00: MQTT connected
Is the wifi connection reliable? What Wifi RSSI values do you have on average? That log comes from the serial interface?
It's so strange I don't see these disconnections...
Should be. There were no reconnects to the AP. RSSI in that timeframe was about -50 dB. There's a gap between 19:31:01 and 19:32:36 though where no RSSI values were recorded.
Edit: The log is from MQTT logging.
Just onboarded on the EMQX wagon (thanks!) and using 7.1. I will quickly see if there's any improvement regarding the uptime. Migration was quite easy. The only thing I missed was setting the "superuser" role for the user connecting using MQTT Explorer but reading the logs helped. :)
Mhm, I just noticed lots of lines like these in the broker's log:
2023-01-23T21:09:24.351408+01:00 [warning] clientid: <nukihub>, line: 446, mfa: emqx_channel:handle_in/2, msg: puback_packetId_not_found, packetId: 2, peername: <ip>:<port>
Which version do you use currently? No message like this in my EMQX logs with 7.1.
puback
is related to QoS=1, doesn't it?
Yes, that's the new QoS-1 version published in https://github.com/technyon/nuki_hub/issues/51#issuecomment-1400664596
Ok. I'll wait a little bit to see how things are going with 7.1 + EMQX and I'll report back with my dedicated Nuki Debug view. 😁 Settings aligned with @alexdelprete and using 60000 ms as RSSI update interval.
Mhm, I just noticed lots of lines like these in the broker's log:
2023-01-23T21:09:24.351408+01:00 [warning] clientid: <nukihub>, line: 446, mfa: emqx_channel:handle_in/2, msg: puback_packetId_not_found, packetId: 2, peername: <ip>:<port>
seems like it's not sending ACKs with QoS=1. Better let @technyon know about this. I hoped the library managed this internally with the different levels of QoS.
Settings aligned with @alexdelprete and using 60000 ms as RSSI update interval.
I thought you said 2s is not too fast, and now you switched to 60s? :)
Settings aligned with @alexdelprete and using 60000 ms as RSSI update interval.
I thought you said 2s is not too fast, and now you switched to 60s? :)
I'm so desperate about finding what's causing reboots I need to test anything. I promise I'll send you a coffee if that was the root cause. :D
In the meantime, I hope the QoS issues are not related to https://github.com/arduino-libraries/ArduinoMqttClient/issues/39. It looks like quite convoluted to implement.
I promise I'll send you a coffee if that was the root cause.
If you don't mind, I prefer to offer you my italian coffee instead. :)
But I didn't say that's the root cause, I was saying that I don't believe in coincidences, and the RSSI topic, at least in my case, was typically the malformed one.
I know the root cause: not very good mqtt libraries. ;)
Maybe we should start looking at another library...:(
Ma dai! Already one reboot after 27 minutes.
Ma dai! Already one reboot after 27 minutes.
This is crazy, look at mine:
And I updated the fw, that's because it rebooted...
So we can exclude:
Lucky you. I am the only one to have this kind of issue so the only thing we don't have in common is the Wi-Fi AP.
Nothing fancy here. I try to fully disable the RSSI updates, just to be 100% sure. xD
What is it? Netgear AP?
Is there a device specific page on the AP where you can see some detail about the ESP32 device? Do you see disconnections on Wifi?
I'm only starting to look at this but I can already see strange things.
I'm starting troubleshooting with disabling of "Band Steering / 802.11 k/v". It's a good candidate for this kind of trouble.
always remember: "it's always DNS" - corollaries: "it's always DHCP" / "it's always WiFi" :)
Now check the timestamps to find the correlation.
Interestingly, I've seen the device drop out from the network during the night, wenn it's DHCP lease time was up. Apparently it did not try to reconnect/reboot, despite WiFi being stable for other devices. Network disconnect timeout was set to 60 seconds, but that apparently didn't help (so the controller probably crashed?)
That would mean the IP stack is bugged. I don't think that's the case. I'm starting to think about this paragraph written in Nuki Hub readme:
It has something to do with some APs. If you can, on the AP, create a WLAN with conservative (legacy) parameters, just to see if it improves the connection. Might be some tricky configuration that doesn't work with ESP32.
Personally I use a mesh based on the Fritz!Box and 4 repeaters. I have 30 Tasmota devices (ESP826x and ESP32) and never suffered from this.
I really don't think this is it. The WiFi already is my dedicated IoT VLAN with pretty conservative WiFi settings (but not totally, so no 802.11b 😅). The freezing thing happen's very sporadically (like every few weeks), so I think some sort of power fluctuation from the dodgy electrical-plug-with-USB is more likely.
Mhm, maybe I should try a different power supply for a while. (Current uptime is 9 hours and counting, so it's not happening all time - there are however lot's of MQTT reconnects.)
Edit: OK, not counting anymore. While I was typing this, the ESP rebooted.
I disabled everything related to restart in the options but I still have reboots from time to time. This time, I have the logs from the AP and the router. I can see that the device is "disassociated". I think the ESP crashes first, not because of a connection drop. It does not reboot because of connection drop since "Restart on disconnect" is not checked and "Network Timeout until restart" is set to "-1". It may be related to the memory corruption issues I saw when logging on serial.
AP (from bottom to top)
Jan 24 20:21:06 nddmp[4268]: alarm : seqNo-[1789], level-[OTHER], info-[{"ip":"10.0.0.186","mac":"4C-75-25-xx-xx-xx","hname":"4C-75-25-xx-xx-xx","txR":"43.30 Mbps","rxR":"72.20 Mbps","ssid":"xxx,"dOs":"Debian-based Linux","dType":"Laptop/PC","mode":"11NG","status":"Authenticated","bssid":"C8-9E-43-xx-xx-xx","chnl":"1","chWidth":"20 MHz","rssi":"36","state":"QOS/ERP/HT","type":"wpa2","idle":"0","time":"00:00:00","txB":"990","rxB":"744","pmf":"none","vlanID":"1","Username":"Not Applicable","radio":"0"}]
Jan 24 20:21:05 nddmp[4268]: alarm : seqNo-[1788], level-[INFO], info-[esp32-xxxxxx (4C-75-25-xx-xx-xx) wireless client connected]
Jan 24 20:21:05 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx WPA: pairwise key handshake completed (RSN)
Jan 24 20:21:05 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx WPA: received EAPOL-Key frame (4/4 Pairwise)
Jan 24 20:21:05 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx WPA: sending 3/4 msg of 4-Way Handshake
Jan 24 20:21:05 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx WPA: received EAPOL-Key frame (2/4 Pairwise)
Jan 24 20:21:05 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx WPA: sending 1/4 msg of 4-Way Handshake
Jan 24 20:21:05 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx IEEE 802.11: associated (aid 1)
Jan 24 20:21:05 configd[4167]: EEM: Received Event: NEW_STA_ASSOC_EEM
Jan 24 20:21:05 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx IEEE 802.11: authenticated
Jan 24 20:21:05 nddmp[4268]: alarm : seqNo-[1787], level-[INFO], info-[esp32-xxxxxx (4C-75-25-xx-xx-xx) wireless client disconnected]
Jan 24 20:21:05 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx IEEE 802.11: disconnected
Jan 24 20:21:05 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx IEEE 802.11: disassociated
Router (from top to bottom)
Tue Jan 24 20:21:05 2023 daemon.info dnsmasq-dhcp[1]: DHCPDISCOVER(eth2) 4c:75:25:xx:xx:xx
Tue Jan 24 20:21:05 2023 daemon.info dnsmasq-dhcp[1]: DHCPOFFER(eth2) 10.0.0.186 4c:75:25:xx:xx:xx
Tue Jan 24 20:21:05 2023 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(eth2) 10.0.0.186 4c:75:25:xx:xx:xx
Tue Jan 24 20:21:05 2023 daemon.info dnsmasq-dhcp[1]: DHCPACK(eth2) 10.0.0.186 4c:75:25:xx:xx:xx esp32-xxxxxx
After the reboot, it looks it took about 2 minutes before getting Presence updates again. (not in the logs above but I track the last_changed
of the presence/devices
topic.
Actually, the less that happens on the ESP, the more stable it seems.
I just disabled presence/devices
update by setting "-1" on "Presence detection timeout". So there's only the maintenance/uptime
that is published when the lock is idle. Since nothing is happening, it may also help to reduce random memory leaks / corruptions.
Are you both using VLANs for these devices? Are there traffic restrictions between the IoT VLAN and Nuki Hub?
@Mincka, If you device rebooted, then you need to look earlier then the Jan 24 20:21:05 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx IEEE 802.11: disassociated
line. I am pretty sure that by this point, the device was already starting up, because the next log line is the authentication for the new WiFi connection (at the same timestamp of 20:21:05
. So whatever happened on the ESP side, it must have come earlier.
There's nothing in the logs before. No VLANs. Mixed 2.4 Ghz/5 Ghz without other issues.
One of the ESP had an unrecoverable crash this night (did not manage to restart).
Jan 25 04:28:23 nddmp[4268]: alarm : seqNo-[1824], level-[INFO], info-[esp32-xxxxxx (4C-75-25-xx:xx:xx) wireless client disconnected]
Jan 25 04:28:22 kernel: [323338.424112] ieee80211_send_deauth: [wifi0vap0]: Sending deauth to 4c:75:25:xx:xx:xx with reason 34
Jan 25 04:28:22 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx IEEE 802.11: disconnected
Jan 25 04:28:22 hostapd: wifi0vap0: STA 4c:75:25:xx:xx:xx IEEE 802.11: disassociated
34 - Association denied due to excessive frame loss rates and/or poor conditions on currentoperating channel
The uptime stopped updating 2 minutes before the disconnection:
The presence/devices
topic also stopped updated at that time, so before disconnection of the Wi-Fi:
I thing this one is a legitime use case for restart on BT loss so I am going to set this setting again, with 600 seconds of delay.
The other ESP crashed this morning and managed to reboot. Exact same logs as in the previous post.
What are your usual uptimes @mundschenk-at ?
34 - Association denied due to excessive frame loss rates and/or poor conditions on currentoperating channel
Like I said in previous posts, it looks like a wifi compatibility issue...
Do you have other ESP32 devices on wifi, with different fw (tasmota, etc.)?
A dozen of Shelly which are ESP-based, no issue with them. This error 34 was the only one in the logs. My latest idea is to connect two different ESPs on the same lock, at the same time.... Of course it's messed up in HA because of the discovery but I still have my manually configured MQTT sensors so I can track uptime and logs. That's enough to see if they fail together or not. It could be considered like a failover system. 😂
I have 20 shellies, but they're all ESP82xx based. Only the new ones are ESP32 based. Which models do you have?
My latest idea is to connect two different ESPs on the same lock, at the same time....
You're complicating things...:)
Do you have only one kind of device? What device are you using? It would be good to check 2 different devices to see if the behaviour is exactly the same...
One is the M5 Atom Lite and the other one is a ESP32 DevKitV1.
I'm using the Atom U. What about the power? How are you powering it?
I'm trying to exclude the basics, but I keep thinking it's WiFi.
Independent Samsung chargers with at least 1A output. On the Wi-Fi side, I try to disable "WMM Powersave", however, from what I see, it should be supported by ESP32 chips, but that's just another test...
Julien, another test: if you have another M5 Atom, can you flash tasmota and see if it disconnects often? So we can exclude the fw.
Hi,
I upgraded my nuki hub with the last firmware. Now I have many :
Client nukihub already connected, closing old connection New client connected from 192.168.0.227:58397 as nukihub"
With the previous version (v6.11) I had some reconnections because of malformed mqtt packet. It seems I have not anymore these malformed packet, only these new messages.