hirotakaster / MQTT

MQTT for Photon, Spark Core
Other
216 stars 118 forks source link

Reconnect reuses the same underlying TCPClient object? #54

Closed micahwedemeyer closed 7 years ago

micahwedemeyer commented 7 years ago

I've got a Photon that keeps losing and regaining wifi. It wouldn't be a problem except that it seems like it cannot re-connect to the MQTT broker after that. It's difficult to troubleshoot because I'm having trouble logging the information necessary.

But, in looking at the MQTT.cpp code, I notice that the same TCPClient object is used and re-used when you connect and re-connect. So, if you connect and then disconnect and then connect again, it is still using the same object (_client) Is that okay? Is that safe? I honestly don't know and I'm just trying to figure out what could be going wrong in my code.

micahwedemeyer commented 7 years ago

Unfortunately, it's very difficult for me to troubleshoot this, as it only occurs when my device loses wifi, which happens ~ 1/hour. So, I make a change...then wait...

I've been fighting with the reconnect for a while, and still not much to report. I tried creating a new TCPClient object in the connect method, but that didn't seem to help.

I'm going to try delaying the reconnect by a few seconds (15 for now) and see if that helps. I've read a few threads that say the Photon has a built-in timeout on the sockets and it could be that they're not completely closing or cleaning up. Perhaps with a delay the underlying socket will get cleaned up and can then be re-used.

hirotakaster commented 7 years ago

Hi @micahwedemeyer, @KennPunt , Yes, _client is TCPClient object. And I check this source code.

#include "MQTT/MQTT.h"
void callback(char* topic, byte* payload, unsigned int length);
MQTT client("some mqttserver", 1883, callback);

void callback(char* topic, byte* payload, unsigned int length) {
    char p[length + 1];
    memcpy(p, payload, length);
    p[length] = NULL;
    String message(p);
    Serial.println(message);
    delay(1000);
}
void setup() {
    RGB.control(true);
    Serial.begin();
}
void loop() {
    if (client.isConnected()) {
        client.loop();
    } else {
        Serial.println("Connect to the MQTT server!!");
        client.connect("sparkclient");
        if (client.isConnected()) {
          client.publish("outTopic/message","hello world");
          client.subscribe("inTopic/message");
        } else {
          // simple wait
          delay(5000);
        }
    }
}

then following step.

  1. startup the Photon and connecting to the WiFi/MQTT server(check your debug log).
  2. shutdown wifi router.
  3. MQTT client find disconnect from MQTT server, because of MQTT client have a KEEPALIVE(MQTT ping request) timeout default 15sec in loop(). If keepalive check is fail, TCPClient is closed.
  4. start up wifi router.
  5. re-connect to mqtt server is success.

screenshot

micahwedemeyer commented 7 years ago

@hirotakaster Thanks for the debugging. I changed a few things around in my code and also got it to work (so far).

I think the key is the delay(5000) statement. I had delay(50) between each reconnect attempt, and from what I've read it sounds like the underlying sockets on the Photon have some hard-wired timeouts that need to occur before they can be re-used.

So, as far as I can tell, the key is to delay long enough before attempting to reconnect. I set my delay to 15000ms, but perhaps 5000ms is enough. In contrast, 50ms is not enough.

hirotakaster commented 7 years ago

@micahwedemeyer

and from what I've read it sounds like the underlying sockets on the Photon have some hard-wired timeouts that need to occur before they can be re-used.

I think so. When I try a TCPSocket re-use tight loop for testing simple code TCPClient client; while(1) { client.connect("some server", 80); client.stop(); delay(5); }, Photon was locked and needed a re-start. I think maybe the basis of this problem is Photon firmware's TCPClient.

micahwedemeyer commented 7 years ago

Another note: I think my server IP address was falling out of scope and being memory collected, which broke the reconnect. Like so:

void setup() {
  byte ipAddress[]{ip1, ip2, ip3, ip4};
  mqtt = new MQTT(ipAddress, ...etc);
  mqtt->connect("someid");
}

It would connect correctly to begin with, but if the connection was lost, it would never reconnect. My theory (finally...) is that ipAddress was being deleted once it was out of scope for setup() and therefore not available the next time mqtt tried to connect. Ugh...I really hate memory management.

I recently changed it to:

byte ipAddress[]{ip1, ip2, ip3, ip4};
void setup() {
  mqtt = new MQTT(ipAddress, ...etc);
  mqtt->connect("someid");
}

This seems to be working better and I have verified that it will reconnect properly.