odelot / aws-mqtt-websockets

Implementation of a middleware to use AWS MQTT service through websockets, aiming the ESP8266 plataform
GNU Lesser General Public License v3.0
231 stars 67 forks source link

MQTT fails on long messages #2

Closed svdgraaf closed 8 years ago

svdgraaf commented 8 years ago

I think I've found an issue. When I publish a message with a quote it in ", the message is never received on the iot end. When I publish a message without json, it does work:

The topic I subscribed to is $aws/things/xyz/shadow/update.

With regular text:

connecting to wifi
Connected
socket connected
MQTT connecting
MQTT connected
MQTT subscribed
Message 1 arrived: qos 1, retained 0, dup 0, packetid 2
Payload 
Message 2 arrived: qos 1, retained 0, dup 0, packetid 4
Payload foobar
Message 3 arrived: qos 1, retained 0, dup 0, packetid 5
Payload {}
Message 4 arrived: qos 1, retained 0, dup 0, packetid 6
Payload foo

With quotes:

connecting to wifi
Connected
socket connected
MQTT connecting
MQTT connected
MQTT subscribed

No response ;)

svdgraaf commented 8 years ago

Hmmm, after some more testing, it seems to be that is something weird going on with the length of the string. If publish {"barrr"}, it doesn't seem to be receiving anything, but after a while, the esp will reboot (probably buffer overflow?). If I publish a slightly shorter string {"barr"}, it just works.

I'll see if I can find why.

svdgraaf commented 8 years ago

I can confirm that it works both ways (receiving and publishing), longer messages are not processed...

svdgraaf commented 8 years ago

Uuuuuugh, got it! Took me a while! :D

Topic needs to be a const char*, it's probably used a lot, so not passing it by reference will eat up quite some memory probably ;)

Awesome, it works! :D woohoo!

svdgraaf commented 8 years ago

Hmmm, there still is an issue with either long messages, or a long topic name. If a have a long Topic name, eg: $aws/things/your-device/shadow/update, and a long message: {"reported":{"foo": "bar"}, desired:{}}. Nothing will be delivered. If I have a short message and a long topic, it works. A short topic and a long message wil work too. I haven't found where the bug is yet, or what the maximum amount of characters is.

svdgraaf commented 8 years ago

Ha! Got it :)

It was a bit of a search, but the Paho client is by default limited to max packages of X size. So ofcourse, this would result message failures when the topic + message gets a bit longer.

Changing: MQTT::Client<IPStack, Countdown, 50, 1> client = MQTT::Client<IPStack, Countdown, 50, 1>(ipstack); into: MQTT::Client<IPStack, Countdown, 255, 1> client = MQTT::Client<IPStack, Countdown, 255, 1>(ipstack);

Fixed the issue, the 255 is the max length of the MQTT packages (the longer, the memory it will use as well, ofcourse). This is also valid for the incoming package length (so for receiving messages from the outside). Pretty important :)

odelot commented 8 years ago

@svdgraaf I've notice this testing today. I've implemented the auto-reconnection feature and fix the need of that wait for connection workaround. Of course, it would be very nice to stress test these features ;-)

next target.... remove the need for bytebuffer

svdgraaf commented 8 years ago

Awesome, I'll run some more testing tomorrow :)

Cheers! Sander

On 28 mrt. 2016 20:58 +0200, odelotnotifications@github.com, wrote:

@svdgraaf(https://github.com/svdgraaf)I've notice this testing today. I've implemented the auto-reconnection feature and fix the need of that wait for connection workaround. Of course, it would be very nice to stress test these features ;-)

next target.... remove the need for bytebuffer

— You are receiving this because you were mentioned. Reply to this email directly orview it on GitHub(https://github.com/odelot/aws-mqtt-websockets/issues/2#issuecomment-202529468)

odelot commented 8 years ago

hi @svdgraaf, let me ask you something?

are you using the stage or stable version of arduino esp8266 code? 2.1.0 or 2.1.0-rc2?

I am investigating the memory leak after many reconnections. Tried the stable version and nothing worked, neither the first connection. O.o

svdgraaf commented 8 years ago

I'm running a git checkout, it's revision 633e48f3aec5f1c3c11d4498fc90d378d49e6e9f. I haven't had these issues yet (on a Adafruit huzzah board).

odelot commented 8 years ago

I've got the latest version from git and I'm still having problems

this simple code shows that heapspace get smaller after each reconnection

        const char* host = "google.com";
    WiFiClient c;
    for (int i =0; i< 100; i+=1) {
        Serial.print (i);
        Serial.print ("-");
        Serial.println (ESP.getFreeHeap());
        if (c.connect (host,80)) {
            delay (100);
            c.flush ();
            c.stop ();
            delay (100);
        } else {
            Serial.println ("connection refused");
        }
    } 

this code is using esp8266 core :-( my board is a generic one but It seems to be a software issue.

daniele-salvagni commented 8 years ago

I tested that code and I had the same outcome, the heap keeps shrinking after every loop. I did a quick search and it looks like that you are not the first one to notice this, here is a blog post about that (see "The Unexpected Discovery" paragraph): http://internetofhomethings.com/homethings/?p=1232

And a couple issues I found showing more details and possible solutions: https://github.com/esp8266/Arduino/issues/230, https://github.com/esp8266/Arduino/issues/1070

Edit: I did some more testing, here is the output of 200 iterations with a 3 minutes delay in the middle. The heap does recover and this confirms that the issue is caused only because "closed connections go into TIME_WAIT state where they must be kept by the TCP stack for 2xMSL interval".

0-46128
1-45904
2-45712
[...]
97-27472
98-27280
99-27088
Sleep for 3 minutes
100-46128
101-45904
102-45712
[...]
197-27472
198-27280
199-27088
odelot commented 8 years ago

thank you @daniele-salvagni, this mod fixed the issue https://github.com/esp8266/Arduino/issues/230#issuecomment-116816950

strange that it is not in the master branch yet.

svdgraaf commented 8 years ago

I can concur that I'm having the same issue, I just did a recompile with the tcp_abort(_pcb); fix, let's see how long it will run.

btw: I had issues with connecting to the mqtt from the latest checkout on the master esp8266 branch, I'm back to my above mentioned hash now, and everything works like a charm so far.

odelot commented 8 years ago

I'm still investigating before close issue https://github.com/odelot/aws-mqtt-websockets/issues/5

tcp_abort(_pcb) improved, but I am still losing memory. I am looking at the moment in the getCurrentTime function. Maybe it is a good idea to reuse the WifiClientobject. It seems that we lose some memory there. The problem is find free time to work on this :-( I will keep you informed about my findings

svdgraaf commented 8 years ago

I'm now posting the heap size in my latest build to AWS as well, and plotting it into Kibana: screenshot 2016-04-12 08 03 26

The line is pretty lineair, so that's good :wink: I will see if I can find the time to debug the getCurrentTime() method. I saw some other library which used a ntp server for that. Might be a bit overkill though.

odelot commented 8 years ago

@svdgraaf it is a great way to keep track of memory

Here is my findings:

Ive cleaned up the code a little bit (commited it already*) and changed the example (see below).

Now it seems that memory usage is almost ok, althought we need to test and investigate a little more. getCurrentTime is ok as well

Between the first and second connection, we lose 316 bytes (maybe the first subscribe? need to investigate it). After this, the memory seems to float but coming back eventually to the same level that was recorded in the second connection. When we got an error trying to connect to mqtt server, we lose 168 bytes as well.

program execution... free heap space between (parentheses)

connected
51191 - conn: 1 - (27744)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed
362290 - conn: 2 - (27408)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed
672864 - conn: 3 - (27408)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed
984979 - conn: 4 - (27408)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed
1298342 - conn: 5 - (13176)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed
1611359 - conn: 6 - (13176)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed
1922638 - conn: 7 - (27408)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed
2233730 - conn: 8 - (27408)
websocket layer connected / MQTT connecting / 
error connection to MQTT server2
2241845 - conn: 9 - (13176)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed
2551782 - conn: 10 - (27240)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed
2862456 - conn: 11 - (27240)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed

and here is my new example code (still didnt commit it*, I need to test more):

* I probably should make a branch to avoid been commiting all the time. Sorry, I'am used to CVS, SVN, but git is still new to me. I will try to learn all git stuff in the near future ;-)

#include <Arduino.h>
#include <Stream.h>

#include <ESP8266WiFi.h>
#include <ESP8266WiFiMulti.h>

//AWS
#include "sha256.h"
#include "Utils.h"
#include "AWSClient2.h"

//WEBSockets
#include <Hash.h>
#include <WebSocketsClient.h>

//MQTT PAHO
#include <SPI.h>
#include <IPStack.h>
#include <Countdown.h>
#include <MQTTClient.h>

//AWS MQTT Websocket
#include "Client.h"
#include "AWSWebSocketClient.h"
#include "CircularByteBuffer.h"

//#define DEBUG_ESP_CORE
#define MQTT_DEBUG

//AWS IOT config, change these:
char wifi_ssid[]       = "ssid";
char wifi_password[]   = "pass";
char aws_endpoint[]    = "endpoit.iot.us-west-2.amazonaws.com";
char aws_key[]         = "AWSKEY";
char aws_secret[]      = "AWSSECRET";
char aws_region[]      = "us-west-2";
const char* aws_topic  = "$aws/things/owner-group-type-id/shadow/update";
int port = 443;

//MQTT config
const int maxMQTTpackageSize = 2056;
const int maxMQTTMessageHandlers = 1;

ESP8266WiFiMulti WiFiMulti;

AWSWebSocketClient awsWSclient(6000);

IPStack ipstack(awsWSclient);
MQTT::Client<IPStack, Countdown, maxMQTTpackageSize, maxMQTTMessageHandlers> *client = NULL;

//# of connections
long connection = 0;

//generate random mqtt clientID
char* generateClientID () {
  char* cID = new char[23]();
  for (int i=0; i<22; i+=1)
    cID[i]=(char)random(1, 256);
  return cID;
}

//count messages arrived
int arrivedcount = 0;

//callback to handle mqtt messages
void messageArrived(MQTT::MessageData& md)
{
  MQTT::Message &message = md.message;

  Serial.print("Message ");
  Serial.print(++arrivedcount);
  Serial.print(" arrived: qos ");
  Serial.print(message.qos);
  Serial.print(", retained ");
  Serial.print(message.retained);
  Serial.print(", dup ");
  Serial.print(message.dup);
  Serial.print(", packetid ");
  Serial.println(message.id);
  Serial.print("Payload ");
  char* msg = new char[message.payloadlen+1]();
  memcpy (msg,message.payload,message.payloadlen);
  Serial.println(msg);
  delete msg;
}

//connects to websocket layer and mqtt layer
bool connect () {

    if (client == NULL) {
      client = new MQTT::Client<IPStack, Countdown, maxMQTTpackageSize, maxMQTTMessageHandlers>(ipstack);
    } else {

      if (client->isConnected ()) {    
        client->disconnect ();
      }  
      delete client;
      client = new MQTT::Client<IPStack, Countdown, maxMQTTpackageSize, maxMQTTMessageHandlers>(ipstack);
    }

    //delay is not necessary... it just help us to get a "trustful" heap space value
    delay (1000);
    Serial.print (millis ());
    Serial.print (" - conn: ");
    Serial.print (++connection);
    Serial.print (" - (");
    Serial.print (ESP.getFreeHeap ());
    Serial.println (")");

   int rc = ipstack.connect(aws_endpoint, port);
    if (rc != 1)
    {
      Serial.println("error connection to the websocket server");
      return false;
    } else {
      Serial.println("websocket layer connected");
    }

    Serial.println("MQTT connecting");
    MQTTPacket_connectData data = MQTTPacket_connectData_initializer;
    data.MQTTVersion = 3;
    char* clientID = generateClientID ();
    data.clientID.cstring = clientID;
    rc = client->connect(data);
    delete[] clientID;
    if (rc != 0)
    {
      Serial.print("error connection to MQTT server");
      Serial.println(rc);
      return false;
    }
    Serial.println("MQTT connected");
    return true;
}

//subscribe to a mqtt topic
void subscribe () {
   //subscript to a topic
    int rc = client->subscribe(aws_topic, MQTT::QOS0, messageArrived);
    if (rc != 0) {
      Serial.print("rc from MQTT subscribe is ");
      Serial.println(rc);
      return;
    }
    Serial.println("MQTT subscribed");
}

//send a message to a mqtt topic
void sendmessage () {
    //send a message
    MQTT::Message message;
    char buf[100];
    strcpy(buf, "{\"state\":{\"reported\":{\"on\": false}, \"desired\":{\"on\": false}}}");
    message.qos = MQTT::QOS0;
    message.retained = false;
    message.dup = false;
    message.payload = (void*)buf;
    message.payloadlen = strlen(buf)+1;
    int rc = client->publish(aws_topic, message); 
}

void setup() {
    Serial.begin (115200);
    delay (2000);
    Serial.setDebugOutput(1);

    //fill with ssid and wifi password
    WiFiMulti.addAP(wifi_ssid, wifi_password);
    Serial.println ("connecting to wifi");
    while(WiFiMulti.run() != WL_CONNECTED) {
        delay(100);
        Serial.print (".");
    }
    Serial.println ("\nconnected");

    //fill AWS parameters    
    awsWSclient.setAWSRegion(aws_region);
    awsWSclient.setAWSDomain(aws_endpoint);
    awsWSclient.setAWSKeyID(aws_key);
    awsWSclient.setAWSSecretKey(aws_secret);
    awsWSclient.setUseSSL(true);

    if (connect ()){
      subscribe ();
      //sendmessage ();
    }

}

void loop() {
  //keep the mqtt up and running
  if (awsWSclient.connected ()) {    
      client->yield();
  } else {
    //handle reconnection
    if (connect ()){
      subscribe ();      
    }
  }

}
odelot commented 8 years ago

after 28 hours, the esp8266 crashed again... but it seems not be caused by memory leak

I will try to investigate this exception (29) but the memory issue seems to be under control

have any of you guys already had problems with exception 29?

100187627 - conn: 372 - (27240)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed
100498617 - conn: 373 - (27240)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed
100809295 - conn: 374 - (27240)
websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed

Exception (29):
epc1=0x4000e1b2 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

*note that memory usage (27240) from connection 374 is the same from the connection from last post

svdgraaf commented 8 years ago

Nice! I haven't been able to look at your new code yet, but sounds very promising! :)

On 14 apr. 2016 14:16 +0200, odelotnotifications@github.com, wrote:

after 28 hours, the esp8266 crashed again... but it seems not be caused by memory leak

I will try to investigate this exception (29) but the memory issue seems to be under control

have any of you guys had already problems with exception 29?

100187627 - conn: 372 - (27240) websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed 100498617 - conn: 373 - (27240) websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed 100809295 - conn: 374 - (27240) websocket layer connected / MQTT connecting / MQTT connected / MQTT subscribed Exception (29): epc1=0x4000e1b2 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

*note that memory usage (27240) from connection 374 is the same from the connection from last post

— You are receiving this because you were mentioned. Reply to this email directly orview it on GitHub(https://github.com/odelot/aws-mqtt-websockets/issues/2#issuecomment-209910970)

svdgraaf commented 8 years ago

After some looking around, the 29 exception does point to an invalid reference somewhere (trying to overwrite a value in virtual memory which is prohibited). Can you perhaps create push a branch with your current code? I can hook up a debugger, and see where the issue is.

svdgraaf commented 8 years ago

I figured my esp must have crashed by now, but hadn't checked the actual output. I just did, and it's still running (~29 hours now), the heap size is stable for the last few hours around 10k. It's posting to a mqtt channel every 10 seconds. I'm running 633e48f3aec5f1c3c11d4498fc90d378d49e6e9f of the esp8266 repo, with the manual 'fix' for the tcp_abort(). Let's see how long it will last...

screenshot 2016-04-16 12 45 47

svdgraaf commented 8 years ago

Still going strong after 49 hours... Heap size still the same

odelot commented 8 years ago

@svdgraaf the code at github is my most up to date code... I just didn't upload the example code yet, but I've posted it above. it seems that the memory issue was fixed.

aaroneiche commented 8 years ago

I found the sample code in the above comment to be tremendously helpful. The Sample I'd been working off previously was failing frequently on connection. This sample worked great :) I am connected and successfully receiving messages.

aaroneiche commented 8 years ago

It seems I spoke too soon - I'm still running into issues getting the websocket connection to work:


connected
11904 - conn: 1 - (26504)
please start sntp first !
[AWSc] Disconnected!
please start sntp first !
[AWSc] Disconnected!
please start sntp first !
pm open,type:2 0
[AWSc] Disconnected!
please start sntp first !
[AWSc] Disconnected!
please start sntp first !
error connection to the websocket server
19491 - conn: 2 - (11360)
[AWSc] Disconnected!
please start sntp first !
[AWSc] Disconnected!
please start sntp first !
[AWSc] Disconnected!
please start sntp first !
[AWSc] Disconnected!
please start sntp first !
[AWSc] Disconnected!
please start sntp first !
error connection to the websocket server
26987 - conn: 3 - (11360)
[AWSc] Disconnected!
please start sntp first !
[AWSc] Disconnected!
please start sntp first !
[AWSc] Disconnected!
please start sntp first !
[AWSc] Disconnected!
please start sntp first !
[AWSc] Disconnected!
please start sntp first !
error connection to the websocket server
34480 - conn: 4 - (11360)

It just keeps on rolling like that. Any suggestions on what to do to make the websocket connect more reliable? I got it working once, but now I have no idea what make that work.

jatindera commented 7 years ago

Hi, Till yesterday, everything was working fine for me. I connected my ESP8266 development board to AWS IoT. Today morning, I connected it again without making any change and I started getting following error:

please start sntp first ! error connection to the websocket server

Appreciate your help on this.

sh4nnongoh commented 7 years ago

Regarding the SNTP issue:

Add the following line to the Setup section,

*configTime(3 3600, 0, "pool.ntp.org", "time.nist.gov");**