knolleary / pubsubclient

A client library for the Arduino Ethernet Shield that provides support for MQTT.
http://pubsubclient.knolleary.net/
MIT License
3.84k stars 1.47k forks source link

mqtt reconnect hangs on the ESP32 #624

Open tmsd2001 opened 5 years ago

tmsd2001 commented 5 years ago

I use the version 2.7.0 with a NodeMCU 32s, always after about 3.5 hours, the sketch jump in the reconnect function, write "Attempting MQTT connection..." and hangs. I can ping the Node, but nothing else. My only change in the sketch is that I must use the wifi.h instead of ESP8266WiFi.h. My hotfix ist to put a hardware reset into the reconnect function. mqtt_reconnect

SandnerSoft commented 5 years ago

I have exactly the same problem.

What I noticed, send works. Only receiving on my ESP32 does not work after 3.5 hours.

tmsd2001 commented 5 years ago

I'm not sure what it's all about, I've made some changes to my sketch and re-recorded it. Since 4 June I have no more periodic reconnect and connection times of over 24 hours. Was there maybe an update here? @ SandnerSoft: can you try to update your IDE and libraries and upload your sketch again?

niffumau commented 5 years ago

I have the exact same problem, while connected to WIFI it seems to lose the MQTT session and then the moment it runs the connect command it hangs indefinitely. I will update the libraries and the IDE and see how that goes

eos1d3 commented 5 years ago

I switch off my router WIFI for a while then turn it on, sometimes it will connect back to broker, sometimes it won't and will hang at "Attempting MQTT connection..." forever.

Tested with ESP32.

pgardiner commented 5 years ago

Did anyone figure out the root of this issue? I'm having the same problem on esp32. Works fine on esp8266. There is no time pattern, but it's generally over an hour, and it hangs when I run PubSubClient.connect.

niffumau commented 5 years ago

Dammit, i'm not really doing my part for the community and i should have put up how I got around the problem. So the first connect was fine, never a problem connected straight away, it was after MQTT disconnected and I got it to reconnect straight away and that is where it would hang indefinitely. I wrote a lot of error checking code around WIFI and routines that would detect and recheck and recheck the wifi connection to make sure that it wasn't WIFI that had disconnected and when it hung, it was never becuase wifi had disconnected.

The first workaround was a watchdog timer for the case that it hung so it would automatically reboot.

The second work around was the first time I connect to MQTT, I had no delay but after it was disconnected and every subsequent re connection i added a few seconds delay before it would try and reconnect to MQTT. After I did that i stopped getting watchdog timer reboots and it seemed to stay up indefinitely without hanging.

TLDR; ultimately a delay after being disconnected from MQTT before it tries to reconnect to MQTT seemed to fix the problem

ccfiel commented 5 years ago

@niffumau what value did you put in the delay?

niffumau commented 5 years ago

@ccfiel I just set it to 5 seconds, probably overkill. I should point out that i have the same codebase for the ESP8266 and that didn't need the delay.

pgardiner commented 5 years ago

@niffumau thanks for the info. I also set to five seconds after your response a few days ago, and it has been working fine since then. I'm not sure why it's getting disconnected in the first place, since wifi never goes down, and nothing happened on the mosqitto server, but for now the workaround is working well.

niffumau commented 5 years ago

@pgardiner mine wasn't as periodic as some others, it was semi random my suspicion is that the connection gets disconnected normally, maybe my wifi isn't as great as I think it is, then in the re connection process, maybe it hasn't fully processed the disconnection or maybe the connection is still hanging in TCP WAIT or something. I am only thinking along this lines as maybe the WIFI/MQTT is actually handled by a real time process or something. When i get around to it I might see how small I can set the delay to. The interesting thing is that it totally freezes the MCU without the delay

tmsd2001 commented 5 years ago

what I have noticed is that reconnecting periodical comes from the channel switch in the access point, my new access point does not change the channel as often. Maybe the problem comes when the access point changes to another channel and the ESP searches for the old channel?

SkyRalf commented 5 years ago

Hello "niffumau", first of all thank you very much for providing such an interesting library. I am new to the world of coding, so please excuse if I ask "unnecessary" questions. Unfortunately - I am facing the same issue with my ESP32 hanging after about 3 hours. My MQTT broker runs on a RbPi-2 (Raspberry Pi) and alternatively on RbPi-3. In order to connect at all to RbPi I had to use MQTT 3.1 (3.1.1 did not work) The broker is still alive - Verification via smart-phone app is possible. When the problem occurs the "main-loop" is hanging. It freezes with "MQTT connecting ..." Ping to ESP is still responding. Right now I am stuck and do not know how to proceed any further Q: Could you please post a code snippet for a "Watch dog" or for a extended debugging Thank you

tmsd2001 commented 5 years ago

here a snippet:

#include <esp_int_wdt.h>
#include <esp_task_wdt.h>

// use for no hardreset at the fist loop
bool newstart = 0;
void WIFI_Connect()
{
  for (int k = 0; k<25; k++) 
    {
      if ( WiFi.status() != WL_CONNECTED )
        {
        wifidelay = wifidelay + 250;
        WiFi.disconnect();
        delay(wifidelay);
        Serial.println("XXXXXXXXXXXXXXXX---------Connecting to WiFi...-------------XXXXXXXXXXXXX");
        WiFi.mode(WIFI_STA);
        delay(wifidelay);
        WiFi.begin(ssid, password);
        delay(wifidelay);
        delay(wifidelay);
        }
    }
  if ( WiFi.status() != WL_CONNECTED )
    {
    Serial.println("Connecting to WiFi fail.");
    delay (10000);
    hard_restart();
// I think that is no longer necessary, but it does not hurt either.
    delay (2000);
    }
  if ( WiFi.status() == WL_CONNECTED )
    {
    Serial.println("");
    Serial.println("WiFi Connected");
    Serial.println("IP address: ");
    Serial.println(WiFi.localIP());
    }
}
void reconnect() {
  Serial.println("mqtt reconnect start");
// test is mqtt connected
  if (!client.connected()) {
// test is wifi connected
    if ( WiFi.status() != WL_CONNECTED )
        {
          WIFI_Connect();
        }
    if ( newstart == 0 )
        {
          hard_restart();
        }  
    Serial.println("Attempting MQTT connection...");
// insert unique id
    if (client.connect("ESP32-client-abcd")) {
      Serial.println("connected");
      client.publish("outTopic", "hello world");
      client.subscribe("inTopic");
    } else {
      Serial.print("failed, rc=");
      Serial.print(client.state());
      Serial.println(" wait 0.5 seconds");

      delay(500);
    }
  }
  Serial.println("mqtt reconnect ende");
}
void hard_restart() {
  esp_task_wdt_init(1,true);
  esp_task_wdt_add(NULL);
  while(true);
}
void setup()
{
// add to setup
  newstart = 1;
}
void loop()
{
// at the end
newstart = 0;
}
SkyRalf commented 5 years ago

Thank you very much for your fast response. So far I am not able to compile the code due to undeclared function: "hard_restart()" I was not able to find a similar code in the net nor in one of the header files given. Any idea how to proceed?

https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virenfrei. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Am Di., 23. Juli 2019 um 10:18 Uhr schrieb Thomas D. < notifications@github.com>:

here a snippet:

include #include

void WIFI_Connect() { for (int k = 0; k<25; k++) { if ( WiFi.status() != WL_CONNECTED ) { wifidelay = wifidelay + 250; WiFi.disconnect(); delay(wifidelay); Serial.println("XXXXXXXXXXXXXXXX---------Connecting to WiFi...-------------XXXXXXXXXXXXX"); WiFi.mode(WIFI_STA); delay(wifidelay); WiFi.begin(ssid, password); delay(wifidelay); delay(wifidelay); } } if ( WiFi.status() != WL_CONNECTED ) { Serial.println("Connecting to WiFi fail."); delay (10000); hard_restart(); delay (2000); //digitalWrite(ledPin, HIGH); } if ( WiFi.status() == WL_CONNECTED ) { Serial.println(""); Serial.println("WiFi Connected"); Serial.println("IP address: "); Serial.println(WiFi.localIP()); } }

` void reconnect() { // Loop until we're reconnected Serial.println("mqtt reconnect start"); if (!client.connected()) { if ( WiFi.status() != WL_CONNECTED ) { WIFI_Connect(); } if ( newstart == 0 ) { hard_restart(); } Serial.println("Attempting MQTT connection..."); if (client.connect("ESP32-mlx90640-client-abcd")) { Serial.println("connected"); client.publish("outTopic", "hello world"); client.subscribe("inTopic"); } else { Serial.print("failed, rc="); Serial.print(client.state()); Serial.println(" try again in 5 seconds");

delay(500); }

} Serial.println("mqtt reconnect ende"); } `

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/knolleary/pubsubclient/issues/624?email_source=notifications&email_token=AMV67KXOKPSVNS4RJHINCZDQA25F3A5CNFSM4HP6AX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2SJ6VA#issuecomment-514105172, or mute the thread https://github.com/notifications/unsubscribe-auth/AMV67KQUXNNEM4MQIKQTNJTQA25F3ANCNFSM4HP6AX6Q .

tmsd2001 commented 5 years ago

My code was badly formatted. But I also forgot something else.

void hard_restart() {
  esp_task_wdt_init(1,true);
  esp_task_wdt_add(NULL);
  while(true);
}
tmsd2001 commented 5 years ago

At any one, try to set up your Wifi Accesspoint with fixed Wifi Channel. That does not solve the problem with the ESP, but could show that there is the problem

uahrendt commented 5 years ago

I had the same problem: After ~3 hours the esp32 could not reconnect in method PubSubClient::connect calling _client->connect(). I already checked and reconnected the wifi connection in loop() including several delays . But even the esp32 was connected, the WiFi client could not reconnect. With "core Debug Level: Debug" I got an error message from WiFiClient while trying to reconnect like "aborted by software". I used adruino-esp32 via "Arduino IDE Boards Manager". The newest version was 1.0.2. I removed it and now I use the development repository. I did it because there are some changes made 17 days ago: "WiFiClient.cpp - Fix connect() behavior". My device is now running for more than 24 hours without any problems! The next steps will be removing step by step the additional delay() calls while (re-) connection the WiFi connection.

=> https://github.com/espressif/arduino-esp32

pgardiner commented 5 years ago

Thanks @uahrendt, your solution worked for me. Previously, I put in a five second delay, which helped, but I was still having intermittent issues, especially with multiple devices. Yesterday I switched over to the dev version of arduino-esp32, and took out the delay, and it has been perfect now for about 30 hours, which is longer than I have ever made it.

weinrank commented 5 years ago

I had the same problem: After ~3 hours the esp32 could not reconnect in method PubSubClient::connect calling _client->connect(). I already checked and reconnected the wifi connection in loop() including several delays . But even the esp32 was connected, the WiFi client could not reconnect. With "core Debug Level: Debug" I got an error message from WiFiClient while trying to reconnect like "aborted by software". I used adruino-esp32 via "Arduino IDE Boards Manager". The newest version was 1.0.2. I removed it and now I use the development repository. I did it because there are some changes made 17 days ago: "WiFiClient.cpp - Fix connect() behavior". My device is now running for more than 24 hours without any problems! The next steps will be removing step by step the additional delay() calls while (re-) connection the WiFi connection.

=> https://github.com/espressif/arduino-esp32

Hi, I'm the author of the mentioned WiFiClient.cpp - Fix connect() behavior commit. We're using a bunch of ESP32 devices, all using this MQTT library. And all of them (still) show the same behaviour: After some time, the MQTT client detects that the connection is gone and tries to reconnect. The reconnect attempt stucks.

This is the relevant code:

static bool mqtt_connect(void) {
  // Loop until we're reconnected
  if (mqtt_client.connected()) {
    return true;
  }

  Serial.print(F("espClient.connected() = "));
  Serial.println(espClient.connected());

  Serial.print(F("Current MQTT state : "));
  Serial.println(mqtt_client.state());

  Serial.print(F("Attempting MQTT connection to "));
  Serial.print(CONFIG_MQTT_BROKER_ADDRESS);
  Serial.print(" ...");

  // Attempt to connect
  if (mqtt_client.connect(mqtt_client_id.c_str(), mqtt_beacon_topic.c_str(), 0, false, mqtt_last_will_buffer)) {
    Serial.println(F("connected"));
    //mqtt_client.subscribe("inTopic");
    return true;
  } else {
    Serial.print("failed, rc=");
    Serial.println(mqtt_client.state());
    return false;
  }

  return false;
}

Output

espClient.connected() = 0
Current MQTT state : -3
Attempting MQTT connection to 10.42.10.86 ...

The connect() call seems to hang. I would expect the function to return successfully or with an timeout but not blocking forever.

Can you confirm this behaviour for you devices? If yes, we'll have deeper look in the underlying TCP/IP stack.

tmsd2001 commented 5 years ago

I would check before mqtt connect if ever a network (Wifi) connection exists. the original code has a while loop in mqtt connect. There he does not come out if there is no network connection. I changed my code a bit further up there.

uahrendt commented 5 years ago

@weinrank: Hi, I had two different results. In both cases I used adruino-esp32 1.0.2 via the IDE Board Manager.

Case 1, pubsubclient V2.7: After ~3 hours PubSubClient::connect could not reconnect, even WiFi was (re-) connected. The debug message was something like "aborted by software". In this case the method came back with a valid return value, loop() keeps on running.

Case 2, pubsubclient V1.9: In this older version the program structure was like this: _... Serial.println("MQTT (1)"); // Attempt to connect if (mqttClient.connect(MQTT::Connect(deviceId) .set_auth(mqtt_user, mqttpassword))) { Serial.println("MQTT (2)"); ... } Serial.println("MQTT (3)"); ...

After ~3 hours I only got a "MQTT (1)" - no "MQTT (2)" or "MQTT (3)"! In the case mqttClient.connect() didn't came back. The only debug messages a got came from a ticker interrupt each second.

Btw.: I use the development repository for about one week... no problems so far!!!

weinrank commented 5 years ago

Thanks for the information. In my case I'm observing the same behaviour you mentioned in 'Case 2', but it takes much longer (10+ Days) and I'm using the most recent version.

Since I committed the fix, I've seen much less issues. I'm happy it was useful for you. :)

I'll report back as soon I've found a solution.

SkyRalf commented 5 years ago

Thanks for the snippets on how to use the watchdog code. I have managed to get it running with the ESP32 1.0.2. and the development version. In both cases I see SW-resets, because I use the "Hello-World-Count up" and it start anew after a few hour. For now I can live with that setup because I am facing an issue with MQTT-communication between 2 RbPi's. Between ESP or PC to one PbPi works fine. But I guess that is not related to this topic.

marrold commented 5 years ago

I had a similar issue, but upgrading to espressif32 1.10.0 seems to have prevented the issue with client.connect() - it now exits if there is an error rather than hanging

OmarAlkassab commented 4 years ago

Dear @knolleary, dear @marrold, I have faced the same problem, but it still exist even with the core version 1.11.1 (latest version on platformIO VSCode) and for pubsubclient version 2.7

Please note that the same code is running on the esp8266 and run properly! The problem happens after a disconnection then connection to the broker. I tried to enable the loop watchdog timer, after I connect to the server, and after a few seconds the watch dog cause the esp32 to restart.

Any help please?

Tymek commented 4 years ago

Can this be related https://github.com/espressif/esp-idf/issues/4433 ?

OmarAlkassab commented 4 years ago

Any solution for this issue?

gitariana commented 4 years ago

Hy OmarAlkassab,

I also have the same problem and can't get any further:

My setup consists of an ESP32 pico kit, ArduinoIDE 1.8.12 with the current board package 1.0.4 and today's pub subclient 2.7.0. All directly integrated in the ArduinoIDE. FHEM and Mosquito run as brokers on my Raspi. Several other ESP8266 nodes are still running and sending their status publish. I only have one ESP32 board in operation and the MqttClientName E32P_11 is unique.

My problem: After resetting the ESP32 there is a start publish, I can manually trigger further publishs and every 1 hour I send a status publish. Everything runs correctly for weeks. But every 201 min. (more precisely 200 min. and approx. 33 sec.) the MQTT-Connect is lost, it is connected again and I send a reconnect publish. WiFi stays connected. MQTT can be reconnected without errors. The game repeats every 201 min. The 1 hour publishers arrive on time. I can track everything via MQTT.fx. But I can not find a mistake.

I don't hang or crash the ESP32. It does not appear necessary to include a watchdog in this regard. I can not find a wifi reconnect. Wifi always stays connected. Before each MQTT function call, I query Wifi.Connected. It doesn't matter whether I use the Pubsubclient-Lib or MQTT-lib (currently arduino-mqtt 2.4.7). It seems to me that the problem lies in the board package.

One attempt every 190min. a Disconnect and immediately connect again is possible, but does not change the 201min-disconnect.

If I run "the same" .INO on a Wemos D1 module with ESP8266, no MQTT reconnects occur.

I integrated the current Board Package 1.0.4 with the board administrator. The same error with Package 1.0.3. If I use the Board Package 1.0.0 (with board administrator) (everything else remains the same), there are no reconnects, but other inconveniences, so this is not a solution.

My search doesn't get me any further. A new board package is currently no longer being built. With the regularity of 201min. it gives the impression that a time counter is overflowing or going negative or something similar, but unfortunately I can't find a good explanation.

I can find the same problem and questions on the web several times on the web, but unfortunately I still haven't found a working solution that would eliminate or prevent reconnect. Under certain circumstances, publishers scheduled at an unfavorable time may get lost.

If I integrate and translate my project into platformio-IDE, the reconnect can also be determined after 201min. The same board package as in the Arduino IDE is probably also used here.

I couldn't debug yet. It also seems difficult to me to debug every 201min. occurs. (translated with Google)

Klaus

msuryateja37 commented 3 years ago

the same issue is faced by me too. My esp32 is coded to subscribe to a topic in adafruit mqtt, but on long period of running the esp32 hangs somewhere in middle (i.e. keeps connected to mqtt but messages are not receiving )

OmarAlkassab commented 3 years ago

the same issue is faced by me too. My esp32 is coded to subscribe to a topic in adafruit mqtt, but on long period of running the esp32 hangs somewhere in middle (i.e. keeps connected to mqtt but messages are not receiving )

Actually I don't have any solution but change the code to use the AsyncMqttClient for the ESP32. Try it and your problem will be fixed

gitariana commented 3 years ago

where can i find it? Works it on ArduinoIDE? Many Thanks

OmarAlkassab commented 3 years ago

where can i find it? Works it on ArduinoIDE? Many Thanks

https://github.com/marvinroger/async-mqtt-client

It works on Arduino IDE

gitariana commented 3 years ago

Hy Omar Thanks for the information. But the AsyncMqttClient is very incompatible with pubsubclient.h and with mqtt.h. Are you sure that the MQTT connecting loss with the AsyncMqttClient does not occur? Losing the MQTT-Connect every 201min happens in the BoardPackage ESP32, newer than version 1.0.0. With 1.0.0 there is no 201min error, but other problems. No hanging. That's how I made the experience. pubsubclient and mqtt with LAN, without wifi, work correctly Klaus

gitariana commented 3 years ago

The "same" Code in ESP8266 works correktly. Under ArduinoIDE 1.8.12

OmarAlkassab commented 3 years ago

Dear @gitariana I know that the same code works correctly in ESP8266. for this reason I changed my code from pubsubclient library to AsyncMqttClient library. You should rewrite your code completely.

gitariana commented 3 years ago

Hy Omar, I have a few projects that run with pubsubclient. The situation is very unfortunate for me. Why did the procedure have to be changed? Does that mean that the leak is in the boardpackage, but it cannot be corrected. Instead, another procedure for MQTT has to be carried out in order to bypass the MQTT unconnect every 201 minutes? Do I understand that correctly? Klaus

OmarAlkassab commented 3 years ago

Dear @gitariana I've been working on 10 projects that uses the ESP8266, but when I started another project using the ESP32, the code hangs after mqtt.connect() function. I tried a lot of proposed solutions without success. So I decided to use the AsyncMqttClient, and it's easy to change from pubsubclient.h to AsyncMqttClient (with few changes). Sorry, I didn't understand what you meant by:

Instead, another procedure for MQTT has to be carried out in order to bypass the MQTT unconnect every 201 minutes?

gitariana commented 3 years ago

Dear @OmarAlkassab MQTT doesn't hang up on me. It runs correctly for many days. But after always 201min MQTT-Connected is lost. I detect it, reconnect and it goes on. No hang, no reset. But the MQTT loss seems to be a bug. If it occurs in a publish, the publish is lost. My findings are that WiFi goes unconnected for a very short time, but cannot be detected. But MQTT loses its connect. I see the same behavior when my internet-router changes WiFi channels. Then the 201min start again. The error appears in the WiFi part of the BoardPackage.

  1. Is the error in the WiFi BoardPackage?
  2. Does AsyncMqttClient use its own WiFi library? Arduino reports multiple WiFi.h.
  3. AsyncMqttClient requires AsyncTCP. Is https://github.com/me-no-dev/AsyncTCP the right version?
chakjer commented 3 years ago

the same issue is faced by me too. My esp32 is coded to subscribe to a topic in adafruit mqtt, but on long period of running the esp32 hangs somewhere in middle (i.e. keeps connected to mqtt but messages are not receiving )

Try to subscribe for topics one more time if you reconnected to mqtt.

gitariana commented 3 years ago

Dear @OmarAlkassab Unfortunately, I haven't received an answer from you yet.

I have interesting news: In my further search on the web, I found an indication that MQTT / WiFi no longer go unconnected every 201min if a fixed IP is used, instead of DHCP IP. My attempts with 3 x ESP32 modules with a fixed IP show no unconnect every 201min for several days. Is the error in the WiFi Lib?

OmarAlkassab commented 3 years ago

Dear @gitariana, sorry for my late reply. First, I have tested the AsyncMqtt library with no hanging problems, and yes the AsyncTcp library is that you sent. But I have another problem now, which is the ESP32 reboot after more than one week of continuous working. I didn’t record the logs yet to know the reason of reboot. But I think it’s a low free heap space (the reboot happened at between a disconnection and reconnection period). I will see then return back to you dear.

Regarding the Static IP solution you mentioned, it can’t worked for me. Because I’m designing a commercial product, so don’t know the router IP address for the clients. For this reason I need the DHCP mandatory.

WABez commented 3 years ago

the same issue is faced by me too. My esp32 is coded to subscribe to a topic in adafruit mqtt, but on long period of running the esp32 hangs somewhere in middle (i.e. keeps connected to mqtt but messages are not receiving )

Actually I don't have any solution but change the code to use the AsyncMqttClient for the ESP32. Try it and your problem will be fixed

I am using the AsyncMqttClient for the ESP32 (Arduino IDE), and my ESP32 disconnects every 5hours 47minutes and 45 seconds, i.e. 347 minutes & 45 seconds OR simply disconnects every 20865 seconds.

OmarAlkassab commented 3 years ago

Dear @WABez Unfortunately, I can't notice this problem because my router disconnects form my ISP due to the number of devices connected and the speed is too low in my region

helmiItsavirus commented 3 years ago

Does anyone have to change the wireless mode to "n only" on your router? It seems wireless issue since I use mix b/g/n that makes a problem and I used the "N only" mode it makes it more stable. One thing you should know, Node MCU has a limited range to connect to the router. In my experience I only set the distance range as 10 meters maximum that It's covered by something or 15 meters for free by the cover, just make sure that hasn't a solid obstacle.

MQTT successfully reconnect:

  1. NODEMCU with cover(optional)
  2. Distance only 10 meters without obstacle, if the distance has obstacle please make sure the distance less than 10 meters.
  3. Set router to "n only"
  4. Use a stable power supply such as a good brand cable and head of the charge.
  5. Use a good brand for the router. Such as Ubiquity, Netis for the cheap ones. I have a lot of problems use TP-LINK when the distance around 10 meters and there are obstacles.

I just share my experience even though out of this topic.