Closed alexdelprete closed 1 year ago
Could indeed be a library problem. As far as I understand if a an incoming messages is received while publishing, it can corrupt memory.
So we old dinosaurs of IT still have a good instinct when troubleshooting. ;)
I'll try the updated firmware and keep you guys posted. Thanks a lot for all the support from everybody.
Jan, I see that https://github.com/knolleary/pubsubclient/pull/835 and also https://github.com/knolleary/pubsubclient/pull/843 were never merged.
So from my perspective, it seems the problem is still in the library. Question: is there another library that could be used to manage MQTT communication?
UPDATE:
The first one (arduino-libraries) should be maintained by the official Arduino Team:
This organization hosts the official libraries maintained or supervised by the Arduino team
I'd stick with the official...if it does everything you need in Nuki Hub.
Here you go, NUKI Hub with ArduinoMqttClient. I had to work out a few quirks of the new lib, but all in all it wasn't too hard to replace. Please go ahead and test.
Thanks again for the effort. I hope this helps. Both ESP are now running 7.0-arduino-mqtt-1
. I removed one from the door and put it next to the ESP just to see if it also makes a difference for my connection issue.
Quick question: why the nuki_hub.bin
above in the zip is not the same as the nuki_hub.bin
in the ZIP artefact for this build (1488 KB vs 1388 KB)? I was thinking we could use the bin from the build artefacts directly.
Here you go, NUKI Hub with ArduinoMqttClient. I had to work out a few quirks of the new lib, but all in all it wasn't too hard to replace. Please go ahead and test.
Wow Jan, I didn't expect it so quickly. Hope it didn't take too much of your time.
I just upgraded and made sure the MQTT topics were "clean". Now let's wait some days to see what happens.
Thank you so much for this. :)
@Mincka I'd need to check ... I think the CI is still running with Arduino core for ESP32 2.0.5, I'm using 2.0.6 ... or maybe some cmake options, or because I'm using ninja instead of make. The difference is size is somewhat significant though, I'll investigate it when I have time.
Jan, I just checked and I don't see malformed topics under the main nukihub topic:
Bad news is that I had all nuki sensors unavailable in HA, so checked the autodiscovery topics, and found a problem: they are getting truncated.
Looks like the maximum payload is 256 chars:
{"dev":{"ids":["nuki_1bf77d65"],"mf":"Nuki","mdl":"SmartLock","name":"Portoncino"},"~":"nuki_hub","name":"Portoncino battery voltage","unique_id":"1bf77d65_battery_voltage","dev_cla":"voltage","ent_cat":"diagnostic","stat_t":"~/battery/voltage","state_cla"
{"dev":{"ids":["nuki_1bf77d65"],"mf":"Nuki","mdl":"SmartLock","name":"Portoncino"},"~":"nuki_hub","name":"Portoncino bluetooth signal strength","unique_id":"1bf77d65_bluetooth_signal_strength","dev_cla":"signal_strength","ent_cat":"diagnostic","stat_t":"~/
{"dev":{"ids":["nuki_1bf77d65"],"mf":"Nuki","mdl":"SmartLock","name":"Portoncino"},"~":"nuki_hub","name":"Portoncino battery voltage","unique_id":"1bf77d65_battery_voltage","dev_cla":"voltage","ent_cat":"diagnostic","stat_t":"~/battery/voltage","state_cla"
The default tx buffer size is only 256 bytes. Upped it to 6144, please give it a try.
The default tx buffer size is only 256 bytes. Upped it to 6144, please give it a try.
All good now. And no malformed topics after almost 24h. :)
Please remember to fix the unit of measurement, you closed #72 with 6.11, but I still see db instead of dBm for BT/Wifi RSSI.
I'm confused I thought it was about being lowercase, so I changed db to dB. This is not a valid unit?
@rodriguezst Could you check if encrypted MQTT still works with the new library?
You are right, I was still reading it as lower-case...need to wear my eyeglasses more often. :)
I'm confused I thought it was about being lowercase, so I changed db to dB. This is not a valid unit?
@rodriguezst Could you check if encrypted MQTT still works with the new library?
Hello @technyon. I have not been following the project lately but I just updated from 6.0 to 7.0-arduino-mqtt-2 and everything seems to be working with encrypted MQTT with the new lib:
Ignore the opener undefined status... batteries run out a few days ago and I haven't replaced them :) Thank you!
@rodriguezst Changed from PubSubClient to Arduino MQTT because the former has some memory corruption issues. Thanks for the update
Let's test it a bit longer, if everything works I'll merge it prepare a release.
No malformed topic on 7.0-arduino-mqtt-1
neither. Using 7.0-arduino-mqtt-2
now on both locks.
No malformed topic on
7.0-arduino-mqtt-1
neither. Using7.0-arduino-mqtt-2
now on both locks.
I thought we had solved...:(
Well damn. I'm still in favor of using the new library, PubSubClient seems to be unsupported, not commits for two years.
I agree. It's much better. Since my last report (18h ago), no malformed topics.
And this library is officially supported by the Arduino team.
Let's test it one more day, and I'll do a new release tomorrow.
Let's test it one more day, and I'll do a new release tomorrow.
Agreed. Maybe that malformed topic was an old one I forgot to delete...;)
Still not seeing this issue on my side but it was always very random and often after few days.
Still not seeing this issue on my side but it was always very random and often after few days.
With the previous library, I saw malformed topics at least every 2 days, sometimes even more often.
I would wait another day (3 days) just to be on the safe side.
Only two minor issues left:
I went ahead and released it as 7.0 ... the new library is better anyway. Fingers crossed the garbled topics are gone.
I went ahead and released it as 7.0 ... the new library is better anyway. Fingers crossed the garbled topics are gone.
Makes perfectly sense. Let's see how it goes...:)
Guys, do we consider this malformed (drain
)?
Can anyone observe messages like in issue #87 or #88 ?
Yes, both. The RSSI update frequency is up to every other second. In addition, I do still see frequent Mosquitto error messages and there seems to be binary data in the RSSI topics:
Can't decode payload b'1\xce\x01' on nuki/lock/rssi with encoding utf-8 (for <Job HassJobType.Callback <function MqttSensor._prepare_subscribe_topics.<locals>.message_received at 0x7f8358f640>>)
Can't decode payload b'1\xf3\x01' on nukiopener/lock/rssi with encoding utf-8 (for <Job HassJobType.Callback <function MqttSensor._prepare_subscribe_topics.<locals>.message_received at 0x7f835e9900>>)
Can't decode payload b'1\xf4\x01' on nuki/lock/rssi with encoding utf-8 (for <Job HassJobType.Callback <function MqttSensor._prepare_subscribe_topics.<locals>.message_received at 0x7f8358f640>>)
Can't decode payload b'1\xf3\x01' on nuki/lock/rssi with encoding utf-8 (for <Job HassJobType.Callback <function MqttSensor._prepare_subscribe_topics.<locals>.message_received at 0x7f8358f640>>)
Can't decode payload b'\xe0\x001' on nuki/maintenance/wifiRssi with encoding utf-8 (for <Job HassJobType.Callback <function MqttSensor._prepare_subscribe_topics.<locals>.message_received at 0x7f83f453f0>>)
Edit: I indeed see both issues and have commented there.
Can anyone observe messages like in issue #87 or #88 ?
Can you check if the new library has an option for persistent connections? It seems to open a new connection every time it needs to communicate with the broker. That is a common problem found also in other projects that communicate with an MQTT broker.
There's not that much to configure:
Actually not so fast! Persistent connections in MQTT are configured via the clean sessions flag (what a misleading name). Settings it to false makes connections persistent, of course it defaults to true.
Try this binary with clean sessions set to false:
Finally, I also got a buggy one:
Now running 7.0-cs-false
.
Persistent connections in MQTT are configured via the clean sessions flag
Do we use a dynamic clientID? That could be a problem. It would be also good to have an option in MQTT and Network Config
section to define a specific clientID
(with a default maybe) and also if the Clean Session flag should be on or off (default off).
ref. https://www.emqx.com/en/blog/mqtt-session
I also hope the library takes care of this:
Finally, I also got a buggy one:
You had the malformed one with the clean session firmware or with the previous one?
I don't think it's a coincidence RSSI topic is often the malformed one, it's the one that is updated more frequently.
@technyon RSSI is updating every 1-2s, I think this is a problem if the value comes from the Nuki BLE API, because it will drain the battery. If it's the RSSI on the ESP32 side, no battery issue, but anyway it's too chatty, it should be updated like the other values.
I didn't have any malformed topics, I checked before upgrading to the 7.0 cleansession fw.
I can confirm from my EMQX console that the Clean Session
flag is false and Session Expiry
Interval is 2h, the device ID is static and it's the hostname.
Finally, I also got a buggy one:
You had the malformed one with the clean session firmware or with the previous one?
With the previous one.
There's no battery drain from the bluetooth rssi value. The lock (or opener) sends beacons all the time anyway, the ESP just picks them up. This is actually what is changed by the "energy-saving" setting in the app. Depending on wether you set it to slow or fast, more or less beacons are sent, hence the battery drains faster on fast.
the ESP just picks them up
ok, that's what I needed to clear, thanks.
one issue remains: the 1s update is also being picked up by HA through MQTT, and usually it's not recommended to have an entity recorded that updates so frequently. Is there any chance we can have that throttled in some way? An option to align the frequency update to the other sensors?
I think the fact that the RSSI topic is frequently the one being malformed is due to this frequency. It stresses things a little bit too much. :)
I don't see an issue with this personally. The broker is getting lots of updates per second and can handle this without issues. Since it has no impact on the Nuki battery, throttling mechanism would just add more complexity in the code without being sure that will fix anything, it's just hypothetical.
In the end, what's the impact of having a malformed maintenance
or wifi
topic once in the while? You just miss an update of a non critical sensor. We can live with that. We have no proof that's related to reboots. I have reboots without malformed requests.
Actually not so fast! Persistent connections in MQTT are configured via the clean sessions flag (what a misleading name). Settings it to false makes connections persistent, of course it defaults to true.
Try this binary with clean sessions set to false:
Does not seem to have helped much (if at all), the Mosquitto log still looks the same (lots of connects, including error messages about Bad client <nukihub> sending multiple CONNECT messages.
).
@technyon, while I understand the wish to change to a maintained library, the results with the new one are not good so far. I have created #90 for the new issue of binary data in the payload. It looks like the new library has a buffer corruption problem as well (albeit a different one). For me, the most stable version was https://github.com/technyon/nuki_hub/issues/51#issuecomment-1383137614.
Does not seem to have helped much (if at all), the Mosquitto log still looks the same (lots of connects, including error messages about
Bad client <nukihub> sending multiple CONNECT messages.
).
That is really strange, I don't have these multiple connect messages warnings/errors. And if the client is using a persistent connection, there shouldn't be. Can you check with your broker that nuki_hub is actually using a persistent connection? I verified with mine and it's using it.
the results with the new one are not good so far
in my setup, this version is the best one so far, no malformed topics, just one binary payload one time, and that's it, no problems apart that one.
I don't see an issue with this personally. The broker is getting lots of updates per second and can handle this without issues.
Like I wrote above, it's not about the broker, MQTT brokers are designed for heavy loads. The problem is HA recorder. Some users with RPi might have issues, very frequent updates are not the HA realm on those kind of setups. For me I have no issues, HA is running on a pretty good server with SSDs etc. but I'm thinking about other users. Eventually, they could exclude RSSI entity from the recorder, but sincerely, I see no value updating that sensor every second. It should update like the others.
In the end, what's the impact of having a malformed
maintenance
orwifi
topic once in the while? You just miss an update of a non critical sensor.
The RSSI is not a critical sensor either, so why updating it every second?
We have no proof that's related to reboots
I didn't say it causes reboots, I said it't contributing to malformed topics. In my case the RSSI topic is usually the malformed one. And I don't believe in coincidences when they are so frequent.
Can you check with your broker that nuki_hub is actually using a persistent connection? I verified with mine and it's using it.
How would I do that?
How would I do that?
I don't know with Mosquitto, that's one of the reasons I chose EMQX vs Mosquitto.
With EMQX I can check the Clean Session flag via the admin UI:
Maybe with mosquitto there's some cli command or maybe in the logs...
Looks like its not possible without storing the client IDs manually on connect. Maybe I should switch broker as well.
Looks like its not possible without storing the client IDs manually on connect. Maybe I should switch broker as well.
Nuki Hub uses the hostname as client ID when it connects, it's not dynamic. What to they mean by "manually"?
Looks like its not possible without storing the client IDs manually on connect. Maybe I should switch broker as well.
Nuki Hub uses the hostname as client ID when it connects, it's not dynamic. What to they mean by "manually"?
That you have to code the logic yourself. See this Stack Overflow post for details: https://stackoverflow.com/questions/9767040/get-a-list-of-connected-client-ids-from-mqtt-client
That you have to code the logic yourself.
Luckily, when I chose what broker to use for my homelab, I spent quite some time doing my research. I will never regret that decision. EMQX v5 is higly recommended if you want to properly manage the broker.
Today I noticed that lock sensor in HA was reporting unlocked when Nuki was locked, I checked MQTT and noticed this:
You will notice that
lock/binaryState
isunlocked
butlock/state
islocked
.I restarted Nuki Hub to force it to reset the states from the lock and everything was good again.
BTW: is it possible to add a restart button in the UI? right now I go into settings and hit save to force it to restart)