Closed rousveiga closed 3 years ago
I have a similar problem, a few times my esp32 disconnects from the broker and I had to reboot it to re stablish the connection, I wasn't sure if this was a library bug or it was mine.
a few times my esp32 disconnects from the broker and I had to reboot it to re stablish the connection
@GioTB That's exactly the same behavior I get. Do your logs look like mine, i.e. do they show the client getting stuck in the reconnecting stage?
I have a similar occurrence when if the broker is down for a while and comes back up (say >5mins when doing a server upgrade) AysncMQTTClient fails to reconnect. I'll see if I can create some small test code to reproduce it.
IIRC @rousveiga's other post correctly, you are already working with the develop branch? So the current recommendation "please try with develop and see if it works" doesn't help here...
Yes, I'm using develop indeed.
@rousveiga my implementation itĀ“s different to yours, but it does basically the same thing (i implemented it with freertos timer for the reconnection), each time the "onDisconnect" callback it called it attempts a reconnect and leave the timer on until it connects (and yes, it does get stuck trying to reconnect), since i have the device far away from me, i had to implement a reboot of the ESP32 in case it pass too much time and it coulndĀ“t connect. The error itĀ“s not recurrent at all, for me it has happend like 3-4 times in a period of about 4 month. one thing to notice itĀ“s that my broker (i use flespi.io) says that the device "connects and disconnects" several times, thatĀ“s odd.
@luebbe i just realize i was using the master branch, iĀ“m using platformio and i use the "ottowinter" repo, now iĀ“m going to try with the develop branch of this repo. By the way wich itĀ“s the difference between ottowinters repo and this one?
@GioTB This is the original where @bertmelis has made a lot of fixes to the develop branch in the past months. I haven't looked at ottowinters fork but I guess he has also tried to fix some of the bugs in his fork.
@luebbe i just realize i was using the master branch
any update ? i have multiple node and i end up with restarting router most of the time to fix it :/
@cyber-junkie9 i canĀ“t tell you yet if it works, at least until now i havenĀ“t lose connection, but there is a quick patch that you can implement on your code so you donĀ“t have to manually reboot your nodes: add a counter, so if it goes further than a certain amount of reconnection attempts the node should restart. i had to implement that on my code so i wouldnĀ“t lose the device for ever (my main device itĀ“s about 2 hrs far from me)
add a counter, so if it goes further than a certain amount of reconnection attempts the node should restart. i had to implement that on my code so i wouldnĀ“t lose the device for ever (my main device itĀ“s about 2 hrs far from me)
@GioTB Might do this myself, my devices are away from me as well.
Is everybody who experiences this using a ESP32 or does it also occur on ESP8266?
I wrote a mini app to try and recreate the problem, but couldn't anymore!
I'm using only ESP8266 and I have one device that sometimes has MQTT reconnection issues. Admittedly I was too lazy to dig deeper, since power cycling it usually solves the issue.
@bertmelis I'm only using ESP32
@GioTB I just realized something. From your comments, I infer that you get your onDisconnect
handler called several times; as in, the device attempts to reconnect, it fails, and this loop repeats forever unless you reboot. And the fix you mention can be implemented in the sketch side of things, rather than patching the library. Am I correct?
For me, the onDisconnect
handler only gets called once, I call connect
again, and that single reconnection is what happens forever unless reboot.
I'd love to see a wireshark replay of this.
It is also not clear to me which device actually initiates the continuous disconnections: the client or the broker.
I'd love to see a wireshark replay of this.
I'll try to provide it.
It is also not clear to me which device actually initiates the continuous disconnections: the client or the broker.
It's not clear to me either. I'm experiencing this behavior with the Mosquitto add-on for Home Assistant; the logs I have access to get cut off after a certain point, and I haven't yet found a way to get the full logs. I will try running my MWE with a different broker.
Another thing I want to do is to get some logs of the AsyncTCP side of things, to see if it helps the investigation.
@rousveiga I remember reading something (probably on heise online, a German computer magazine, but I can't find the article) that there was a problem with a recent update to the mosquitto add-on for HA. People had to revert to a previous version.
I'm using HA, but my mosquitto runs on a separate raspberry.
@luebbe That's interesting. I'll look it up. Thanks!
@bertmelis I just remembered, even though I haven't confirmed it 100%, that my other MQTT devices (running Espurna) seem to work fine, so at first I thought it would be a client issue.
Does Espurna use pubsubclient or also this lib? I had the impression it uses this.
@GioTB I just realized something. From your comments, I infer that you get your
onDisconnect
handler called several times; as in, the device attempts to reconnect, it fails, and this loop repeats forever unless you reboot. And the fix you mention can be implemented in the sketch side of things, rather than patching the library. Am I correct?For me, the
onDisconnect
handler only gets called once, I callconnect
again, and that single reconnection is what happens forever unless reboot.
@rousveiga Yes!, precisely that!, so our reconection loop itĀ“s different, another thing that i do itĀ“s to activate a timer wich attempts to reconnect every 5 seconds, so the "connect" itĀ“s called several times, in the device that i have on the field sometimes, it takes up to 14 tries until it reconnects, but this could be for the internet connection, or some other problems, i canĀ“t say itĀ“s cause of the library, and the most important thing itĀ“s that after a while it manages to connect back to the broker. Worth noting that once i changed to the develop branch i havenĀ“t had the issue anymore, i even put a ESP32 disconnecting every 20 seconds so it would attempt the reconnect on itĀ“s own, and it always reconnect with no problem.
For reference: only try to reconnect if the previous attempt has failed. The broker disconnects the oldest client when a new one with the same ID tries to connect.
This reconnect loop could be a timing issue.
@bertmelis Thanks!!, one question, the "onDisconnect" callback itĀ“s called each time the client fails to connect? (and obviosly after it had a succesull connection that drops) right?, the point of my question itĀ“s that i guess i could call the "connect" method only when the "onDisconnect" itĀ“s triggered. Currently iĀ“m calling Connect every 5 seconds, independant on the "onDisconnect" callback (i do this with a freertos timer, that only itĀ“s stoped once the "onConnect" callback itĀ“s called)
@GioTB Yee, the onDisconnect is called every time. Also for example when the client can't connect and the connection attempt timeouts (the asynctcp lib does this).
Does Espurna use pubsubclient or also this lib? I had the impression it uses this.
@bertmelis I checked it out and, while you can configure the use of other libraries, the default is this one, yes.
I looked into the issues and found people experiencing the same problem: https://github.com/xoseperez/espurna/issues/2365, https://github.com/xoseperez/espurna/issues/2112.
Looks like Espurna maintainers have their own fork and fix: https://github.com/mcspr/async-mqtt-client/commit/c1fcfd1. I'm going to try to apply this patch and see if it works.
the point of my question itĀ“s that i guess i could call the "connect" method only when the "onDisconnect" itĀ“s triggered.
@GioTB Yes, that's exactly what I do.
I'm going to try to apply this patch and see if it works.
Worked ?
@cyber-junkie9 So far, it looks like it - I have three devices working since Thursday - but I'm going to keep monitoring them for a few more days just to be sure.
There is indeed a flaw: when the TCP connection is made, the MQTT ping system is not working yet. Since the TCP connection is made, there is nothing to timeout.
This fix will indeed solve that I think. Not sure though why a broker would stop communicating between accepting the TCP connection and the CONNECT packet.
EDIT: there might be 2 issues: one with a single reconnect that gets stuck and one with a continuous connect/reconnect loop. I'm talking about the single event here.
EDIT2: I might be mistaken, working from a smartphone screen...
It's been a week without trace of this issue, so I'd say the fix did indeed work.
Here's my patched file. I tried to add the setRxTimeout
invocations at the same points, but the library seems to have been refactored since it was forked, so it might be a bit off.
It's been a week without trace of this issue, so I'd say the fix did indeed work.
Here's my patched file. I tried to add the
setRxTimeout
invocations at the same points, but the library seems to have been refactored since it was forked, so it might be a bit off.
You're welcome to create a PR (preferably develop branch). I can merge from my poolside lounger on my phone.
You're welcome to create a PR (preferably develop branch).
Okay, I will!
I can merge from my poolside lounger on my phone.
Enjoy! š
@rousveiga @bertmelis great work! thanks!
Issue closed by merging the PR.
Hello! Sometimes, my device disconnects from the MQTT broker and never reconnects again while the sketch is running. If I reset the device, it can connect again without any problems.
I have a MWE and extensive logs. The logfile is pretty big, so I separated the part from when it failed: https://pastebin.com/SPdQK3z5
The
#include "M5StickPlus.h"
andM5.begin(false, true, true);
in the MWE are there because I'm using a M5 Stick-C Plus for testing, which has a ESP32-PICO-D4 inside.I can provide more info if necessary. Thanks in advance!