knolleary / pubsubclient

A client library for the Arduino Ethernet Shield that provides support for MQTT.
http://pubsubclient.knolleary.net/
MIT License
3.82k stars 1.47k forks source link

Connection issues with ESP8266 and mosquitto #320

Open sglvladi opened 7 years ago

sglvladi commented 7 years ago

Hi there,

I am having some issues connecting to mosquitto lately and I am quite confused by what I am seeing.

This is the code I have to perform my initial connection to the broker within the setup() method:

void MQTT_connect() {

  int mqtt_chip_key = ESP.getChipId();
  // Start clean by dropping any previous connection 
  if(MQTT_client.connected()){
    Serial.println("Disconnecting from previous MQTT connection!!!!");
    MQTT_client.disconnect();
    MQTT_client.setServer(mqtt_server, 1883);
    MQTT_client.setCallback(callback);
  }

  // Loop until we're reconnected
  while (!MQTT_client.connected()) {
    Serial.print("Attempting MQTT connection...");
    // Attempt to connect
    randomSeed(analogRead(0));
    String mqtt_client_id = String(MODEL) + String(mqtt_chip_key);
    Serial.println("New client ID: " + mqtt_client_id);
    if (MQTT_client.connect(mqtt_client_id.c_str(), _mqtt_username.c_str(), _mqtt_password.c_str(), ("/" + _db_user_id + "/devices//" + getMac() + "/disconnect").c_str(), mqtt_will_qos, mqtt_will_retain, mqtt_will_payload)) {
      Serial.println("connected");
      // Once connected, publish an announcement...
      MQTT_client.publish(String("/" + _db_user_id + "/devices//" + getMac() + "/disconnect").c_str(), String("0").c_str(), true);
      // ... and resubscribe
      unsigned char mac[6];
      WiFi.macAddress(mac);
      String wildcard = "/" + _db_user_id + "/devices//" + _macAddress + "/+";
      MQTT_client.subscribe(wildcard.c_str());
      MQTT_loop();
      wildcard = "/"+_db_user_id+"/devices/+/"+ _macAddress +"/sensors/+/input/#";
      MQTT_client.subscribe(wildcard.c_str(),1);
    }
    else {
      Serial.print("failed, rc=");
      Serial.print(MQTT_client.state());
      if(WiFi.status()!=WL_CONNECTED){
        Serial.println("Connection to WiFi has been lost. Attempting to reconnect....");
        wifiManager.autoConnect(apName.c_str(), apPass.c_str());
      }
      else{
        Serial.println(" try again in 5 seconds");
        // Wait 5 seconds before retrying
        delay(5000);
      }
    }
  }
}

What happens is that, after MQTT_client.disconnect(); has been executed, the client keeps failing to connect and an rc code of -1 (MQTT_DISCONNECTED - the client is disconnected cleanly) is returned. Below is an example printout:

Disconnecting from previous MQTT connection!!!!
Attempting MQTT connection...New client ID: HubZero8429602
failed, rc=-1 try again in 5 seconds
Attempting MQTT connection...New client ID: HubZero8429602
failed, rc=-1 try again in 5 seconds
Attempting MQTT connection...New client ID: HubZero8429602
failed, rc=-1 try again in 5 seconds
Attempting MQTT connection...New client ID: HubZero8429602
failed, rc=-1 try again in 5 seconds
Attempting MQTT connection...New client ID: HubZero8429602
failed, rc=-1 try again in 5 seconds
Attempting MQTT connection...New client ID: HubZero8429602
....

Once the above happens, the client will never connect.

The odd thing is that if I connect to the broker using the Eclipse Paho client on my desktop, on the next reconnection attempt, the PubSub client will succeed in connecting, while the Paho client connection is dropped. I should note here that the two clients use completely different id's to connect, so the instant connection swap between the two is not caused by an id conflict.

So the two questions that stem from the above are as follows:

1) Why does the client fail to connect with rc=-1 after MQTT_client.disconnect(); is called? 2) Why is it only able to connect by "stealing" the connection created by the Paho client?

Any help with the above will be much appreciated.

Thanks much in advance.

stefanbode commented 7 years ago

Please insert in the PubSubClient.cpp at line 212 a yield(); and recompile from scratch. Known issue.

// reads a byte into result
boolean PubSubClient::readByte(uint8_t * result) {
   uint32_t previousMillis = millis();
   while(!_client->available()) {
     yield();
     uint32_t currentMillis = millis();
     if(currentMillis - previousMillis >= ((int32_t) MQTT_SOCKET_TIMEOUT * 1000)){
       return false;
     }
   }
   *result = _client->read();
   return true;
}
knolleary commented 7 years ago

@stefanbode please can you open a pull request with that fix? Will help get it added and published at some point.

traindriverrev commented 6 years ago

It still won't stay connected for me on the mosquitto client...

GuruLarsson commented 6 years ago

Hi there, I have experienced a similar problem, if the client for some reason cannot connect it's "impossible" to be able to connect again in the connect-loop, my solution is to just try 3 times with a second delay and NOT try to connect for some time (I did a random wait between 3 to 5 min) then it usually works to connect again, I don't know if it's the client or the broker that causes the problem, for me it seems to happen both with "online" brokers and my local ones (raspberry with Mosquitto). Gunnar

2delarosa commented 6 years ago

I'm having a similar problem where reconnects up to 20x before it can publish a message. I'm programming a NodeMCU (ESP-12E Module) on the Arduino IDE 1.8.5 platform and I'm using the pubsubclient 2.6.0 library. The sketch is "Basic ESP8266 MQTT example" from this site. The MQTT broker is a Mosquitto on a raspberry pi and I'm using authentication.

f117zz5 commented 6 years ago

I had the same issue, NodeMCU ESP-12E programmed with Arduino IDE 1.8.5. I had a lots of MQTT disconnects, the WiFi connection was stable. Then I noticed, I had flashed more than one NodeMCU unit with the same sketch and I did not take care of changing clientID, so all units had the same clientID. Then I used the MAC for generating unique clientID and since then I no not have any reconnect issues.

abqmichael commented 5 years ago

It's 2019 and I am having this issue a lot. In my system, I am using a Raspberry Pi running mosquito as the broker (server). I only have one ESP8266 in the system but I am also running python-based MQTT clients from my Macintosh. One to send commands. The other as a monitor to see that the commands have been received.

Typically, I will see the issue after freshly updating the code on my ESP8266 and sometimes from simply rebooting it (using the ESP8266 reset button). Sometimes the ESP8266 has a hard time getting onto my wireless network and tries several times. Sometimes it gets a timeout connecting to the broker. Then a bad connection will often get made. By bad, I mean that the ESP8266 client will say it's connected but won't actually receive messages. After a minute or so, the broker will then see that it's lost its connection and then try to re-establish one. If I let it go for 15 minutes to an hour and come back, I'll often see that it's finally made a connection. At that point, it rocks on for hours or days.

I can see that the yield() change described by stefanbode has been addressed. I have verified that none of my MQTT clients are using the same name. Looking at the server log, I see the following pattern:

1555116749: Client ESP8266-lights has exceeded timeout, disconnecting. 1555116749: Socket error on client ESP8266-lights, disconnecting. 1555116755: New connection from 10.0.1.16 on port 1883. 1555116755: New client connected from 10.0.1.16 as ESP8266-lights (c1, k10, u''). 1555116769: Client ESP8266-lights has exceeded timeout, disconnecting. 1555116769: Socket error on client ESP8266-lights, disconnecting. 1555116775: New connection from 10.0.1.16 on port 1883. 1555116775: New client connected from 10.0.1.16 as ESP8266-lights (c1, k10, u'').

I've also noticed other times where the ESP8266 reports that it's lost connection and then re-connects to the broker before the broker notices it. At these times, the broker establishes a second connection.

Other theories:

I have heard/read that the IP stack on the ESP8266 is a bit touchy. So perhaps this problem is higher in the stack than this library.

It could also be that my ESP8266 is messed up, that my wiring is flaky (I'm using a breadboard) or that my power supply is causing havoc (though I'm plugged into USB + I have a 1A power supply).

It could be that my Raspberry Pi is a bad choice as an mosquito server. Maybe it's slow on accepting connections or dropping connect requests (not noticed from Mac running python scripts).

Has anybody found MQTT on ESP8266 with Raspberry Pi as broker to be reliable? If so, what issues did you address?

GuruLarsson commented 5 years ago

Hi, I have tried a lot of things and I have a few esp8266 that is working without any issues, so YES it can work. What I have found (or at least believe) is the the problem is NOT mqtt but rather some issues with the WiFi connection. One "easy" way to force the issue is to publish inside the callback routine and maybe even easier is to subscribe to same topic you publish to this seems cause the disconnects. One other way of causing this disconnects is the have "to much" debug text to serial. I have NOT been able to get the disconnects on esp32 boards, so something seems to have been fixed in the hardware or software on those boards (both adafruit and nonames). Best regards Gunnar

abqmichael commented 5 years ago

Thanks @GuruLarsson. I will look into ESP32 boards. I only recently became familiar with them. I had set aside that because it seemed like overkill on performance. However, reliability would totally make them worthwhile.

My app does not do any publishing right now. I was planning on publishing status messages after getting the stuff with commands reliable. For clarity, are you saying that two easy ways to LOOSE WiFi connection are to publish within a callback or to subscribe to commands you're publishing to? I will be careful on the first and wasn't planning on doing the second. (I was planning on publishing status on a parallel set of topics and planned to do that after the app finished doing its work.)

GuruLarsson commented 5 years ago

Hi Michael, what I have found is (I suspecT) some sort of timeout (timing) issue with the ESP8266 boards.

I wanted to have some redundancy in case one of the MQTT brokers went offline so I created some code that published my data to 3 (three) brokers, I then found that I got disconnects and started to investigate what happened and found that if I got disconnect from ONE(1) broker ALL the other brokers got disconnected as well and the WIFI.

I then tried to "debug" what happened and could see that if I published in callback I got more problems, the same if I added a lot of debug text to serial.

It might be that the ESP8266 is just to weak to handle the amount of I/O (it didn't matter if it run on 80 MHZ or 160).

The ESP32 has gone down in price so I y´think they are a valid replacement now. Best Regards Gunnar

On Sun, 5 May 2019 at 14:59, abqmichael notifications@github.com wrote:

Thanks @GuruLarsson https://github.com/GuruLarsson. I will look into ESP32 boards. I only recently became familiar with them. I had set aside that because it seemed like overkill on performance. However, reliability would totally make them worthwhile.

My app does not do any publishing right now. I was planning on publishing status messages after getting the stuff with commands reliable. For clarity, are you saying that two easy ways to LOOSE WiFi connection are to publish within a callback or to subscribe to commands you're publishing to? I will be careful on the first and wasn't planning on doing the second. (I was planning on publishing status on a parallel set of topics and planned to do that after the app finished doing its work.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/knolleary/pubsubclient/issues/320#issuecomment-489424205, or mute the thread https://github.com/notifications/unsubscribe-auth/AHUINRKXEMMQALHKPOA4LXLPT3K2JANCNFSM4DYBN3JA .

TheGuruOfNothing commented 5 years ago

I have multiple ESP32's running this library and connecting to Mosquitto. One of them will connect and operate fine for a random amount of time then just not connect again. I put "beacon" code on them all wher they publish a "0" to a beacon topic... beacon/node1, beacon/vault, beacon/gate... and all but one will continue to do that broadcast. The "vault" ESP does not and it stops receiving messages for unlock as well. SO it goes dead stick and I have to power cycle the thing to get it to reset and respond. I have not dug too deep into it yet but it is running carbon copy code to the other 2 ESP's and it is failing to stay connected. As a matter of fact, it is only 5 feet farther away than the node1 board and that is 6 feet away from the Ubiquiti router. Doubt it is a network failure issue. Still looking at it though.

abqmichael commented 5 years ago

Hi Gunnar; I took your advice and switched my application over to using an ESP32 board and have not had any lost connection problems using it. At this point my code is running solidly.

I did run into a perplexing problem with version 1.0.1 of the Esp32 board manager by Espressif. That code had a bug in millis() that made time reset itself every 72 minutes. Sadly, I did a lot of sleuthing to zero in on the problem before discovering that they had already corrected that bug in version 1.0.2. (Bad on me.) This is just to say that sometimes the problem is in the lower-level code.

I am curious whether the boards you're using are all the same vendor, builds and etc. I was using MakerFocus ESP8266 NodeMCU Development board. I have three boards that may be from two different builds. Your note makes me wonder whether I should try out my latest "works great on ESP32" code on one of those "possibly other build" boards.

GuruLarsson commented 5 years ago

Hi Ivan, I have used a few different makes of the ESP8266 and a few different ones of the ESP32 as well, they all seem to work well. I have used 8266's from LoLin, Espressif, adafruit and some nonames. The ESP32's are from Switchscience, Adafruit and some nonames I bought in Akihabara Tokyo, they all work well, the only thing that is a bit irritating is the pinouts and mappings of different "standard" pins for e.g MOSi rst and so on.

I wish you all the luck with your project and I hope that I have been os some help for you and others. Best Regards Gunnar

On Tue, 9 Jul 2019 at 15:17, abqmichael notifications@github.com wrote:

Hi Gunnar; I took your advice and switched my application over to using an ESP32 board and have not had any lost connection problems using it. At this point my code is running solidly.

I did run into a perplexing problem with version 1.0.1 of the Esp32 board manager by Espressif. That code had a bug in millis() that made time reset itself every 72 minutes. Sadly, I did a lot of sleuthing to zero in on the problem before discovering that they had already corrected that bug in version 1.0.2. (Bad on me.) This is just to say that sometimes the problem is in the lower-level code.

I am curious whether the boards you're using are all the same vendor, builds and etc. I was using MakerFocus ESP8266 NodeMCU Development board. I have three boards that may be from two different builds. Your note makes me wonder whether I should try out my latest "works great on ESP32" code on one of those "possibly other build" boards.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/knolleary/pubsubclient/issues/320?email_source=notifications&email_token=AHUINRNJ3Y2QMEUISX65OT3P6SFV7A5CNFSM4DYBN3JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZQHJSQ#issuecomment-509637834, or mute the thread https://github.com/notifications/unsubscribe-auth/AHUINRLJS7FVPSEIO5SRH6TP6SFV7ANCNFSM4DYBN3JA .

saadmajid95 commented 5 years ago

@TheGuruOfNothing have you had any success? I have multiple ESP-32 boards. I am running simple idf's simple tcp-based mqtt example but there are issues with reception of data. I have subscribed my laptop to the same topic and it gets the message really fast but esp32 does not. Also, it is giving frequent "MQTT_CLIENT: mqtt_message_receive: transport_read() error: errno=113" error. Any help or suggestions?

sth519 commented 4 years ago

I just spent an evening debugging weird connection issues:

Given that this is my first project using PubSubClient, I obviously expected my application to be at fault, but then I tried the official samples and eventually ran into the same problems.

Since I was debugging on my desk, I often reached for the reset button and one thing I noticed was that a few times, the connection re-established right when I was about to hit the reset button (just as my fingers were touching the button - and inevitably some of the pins next to it). The first time this happened I didn't really think much about it, but as it happened a couple more times, I started to suspect it might not be a software issue after all.

And indeed it wasn't. I was thinking that running the ESP8266 from the computer's USB port would be stable enough in terms of power, but apparently it's not. I simply added a little buffer capacitor to stabilize the power supply and the issues instantly disappeared.

I hope this is helpful to someone.

vks007 commented 4 years ago

@sth519 , Thank you , you made my day! I have been struggling with the issue of MQTT not being able to reconnect after keepalive timeout for a week now. I do have a callback in my code and it was the culprit. I moved the publish logic to the main loop and it doesnt have the issue anymore. In the last week, I have seen many issues with re connection to MQTT that I believe this isnt such a good use case for ESP8266. My issues were around getting to connect to MQTT after a light sleep and also getting to reconnect after keepalive timeout. For the former, I still believe that I cant rely on it working always. At times , it just gets stuck for no reason.

abqmichael commented 4 years ago

I owe this thread an update. As initially reported, I had a lot of problems getting any reliability out of MQTT when running ESP8266 boards. Moving up to ESP32 boards dramatically improved my reliability. A reason for this is that the ESP32 boards have two cores. WiFi on both uses software defined radios. My sense is that giving one core the dual job of managing the low-level WiFi work plus any application is simply more than the ESP8266 is up to in a practical sense.

My problems were not over after upgrading to ESP32. Besides the perplexing problem with version 1.0.1 of the Esp32 board manager reported above, I also ran into problems getting my board to reconnect after disconnections. In my case, the problem wasn't my ESP32 board but rather the combination of a flaky WiFi hub and bad connection management logic. In my case, I was (and still am) using https://github.com/plapointe6/EspMQTTClient as a wrapper around PubSubClient. I think this code used logic out of an unfortunate PubSubClient demo that has a serious bug in it involving that connection logic.

Shortly before the pandemic, I figured out how to get it all to work and provided my code that in a patch to EspMQTTClient. I also explained the issue, my algorithm, and some helpful information I found elsewhere on the discussion board at https://github.com/plapointe6/EspMQTTClient/issues/33. I then wrote a patch and provided it to the author. I have been using that patch since and it works great.

The EspMQTTClient author incorporated my patch in a branch version of his code. However his work got stalled when the pandemic hit and he was not able to put the code into a release.

If you wish to get your code to be more reliable, my first recommendation is to use EspMQTTClient using the branch including my code. It's a great library. An alternative is to to study the algorithm I provided and implement it in your code.

amuqeet352 commented 3 years ago

Try making a simple function for mqtt connection.

void connectMQTT(){ Serial.println("Connect to Gateway.."); Serial.println(ssid); WiFi.begin(ssid,pass); while(WiFi.status()!= WL_CONNECTED){ delay(500); Serial.print("."); } Serial.println("WIFI Connected"); Serial.print("IP Address: "); Serial.println(WiFi.localIP()); if(client.connect(clientID, username, password)){ Serial.println("Connected to MQTT"); }else{ Serial.println("Connection to MQTT Failed!"); } }