knolleary / pubsubclient

A client library for the Arduino Ethernet Shield that provides support for MQTT.
http://pubsubclient.knolleary.net/
MIT License
3.84k stars 1.47k forks source link

Problem with odd timeout on publish from esp8266 #151

Open NickWaterton opened 8 years ago

NickWaterton commented 8 years ago

Hi,

I was using the MQTT library from http://imroy.github.io/pubsubclient with my esp8266, (isue free) but couldn't get it to work with my new M0 arm board. I switched to your library, which should support both (and does work on my M0, but so far just with a test sketch).

Having re-written my esp8266 sketch to use your library, it does work quite well. What I am seeing is a strange time out on publishing messages from the esp8266 (which I did not have before).

I publish a lot of messages, mostly text for logging events in debug mode. Sometimes, the message will fail to publish - I'm not sure why (memory is fine). It could be to do with the packet delay on the esp8266, but my server is running on Ubuntu (14.04), and I have tried limiting the publish rate, with no effect.

What happens when the message fails to publish is that there is a 5 second time out. This has the effect of hanging the whole sketch for 5 seconds (which on an esp8266 is bad).

I can't find anywhere in your code where there is a 5 second time out, just the 15 second socket time out. I do not get a server disconnection, just publish(message), normally returns immediately, but sometimes takes 5 seconds to return (with a fail).

This can happen several times in a row, but then seems to sort itself out and all is well again. Unless it causes an esp8266 reset, when it can be stuck for a while (quite a lot of messages are published during start up) - but will eventually recover.

I subscribe to 4 topics after initial connection, and it seems to mostly happen after the subscriptions (but it can occur at other times).

I'm wondering if anyone else has seen this? is it my broker? I'm thinking the 5 second time out is the server end of things, but I'm still looking for it.

I have increased the MQTT_MAX_PACKET_SIZE to 256 as 128 was too small for some of my messages (just one). I'm not sure this has any effect on the 5s timeout issue though.

Thanks,

webguy16 commented 8 years ago

"However, the connection to the MQTT broker is blocking during ~5 seconds in case the server is unreachable. This is an Arduino for ESP8266 limitation, and we can't do anything on our side to solve this issue, not even a timeout." https://github.com/marvinroger/homie-esp8266/blob/master/docs/8.-Limitations-and-known-issues.md

Not 100% this is what either of you are talking about... just did a really quick search. Hope it's helpful.

"WiFiClientSecure still has 5 seconds timeout hardcoded, some refactoring is necessary to expose _timeout member at the point when write function is called by axTLS." https://github.com/esp8266/Arduino/pull/1570

namirda commented 8 years ago

@NickWaterton

I had exactly the same problem and it was driving me crazy - calls to publish() were failing with a subsequent 5 second timeout and then the whole lot sorted itself out a minute or two later.

The solution however was trivial in my case - simply make sure that you make a call to pubsub.loop() after each subscribe and each publish. I had just assumed that a single call to pubsub.loop() somewhere in the Arduino loop would be enough but in cases where you subscribe or publish several times in rapid succession it goes wrong.

Perhaps the library could be updated to include calls to loop() after each subscribe and publish call?

Suxsem commented 8 years ago

This is exactly what I did in my fork: https://github.com/Suxsem/pubsubclient

I don't have issues anymore, maybe it can help you too

aitanadev commented 7 years ago

I test the issue with:

  // each second...
  String msg = "ping";
  unsigned long ptime = millis();
  mqttclient.publish("dom/uid", msg.c_str()); // mqttclient.loop(); before and after with the same result
  Serial.print("publishtime:");
  Serial.println(millis()-ptime);

Output:

publishtime:18
publishtime:20
publishtime:19
publishtime:18
publishtime:11
publishtime:17
publishtime:5000
publishtime:180
publishtime:9
publishtime:17
publishtime:676
publishtime:703
publishtime:920
publishtime:20
publishtime:17
publishtime:15
publishtime:694
publishtime:18
publishtime:2268
publishtime:19
publishtime:18
publishtime:27
publishtime:2796
publishtime:19
publishtime:18
publishtime:17
publishtime:21
publishtime:772
publishtime:811
publishtime:17
publishtime:24
publishtime:790
publishtime:2267
publishtime:19
publishtime:5000
publishtime:10001
publishtime:5001
publishtime:441
Attempting MQTT connection...connected as Valvula001
publishtime:5000
publishtime:216
publishtime:2551
publishtime:913
publishtime:24
publishtime:72
publishtime:26
publishtime:16
publishtime:668
publishtime:19
publishtime:5000

ESP8266, Mosquitto as broker and all reset before test

aaron-neal commented 7 years ago

I seem to be facing this problem after upgrading from the stable 2.3.0 ESP8266 Arduino Core to the latest GIT version.

Was there any solution?

In my case I am polling 4 sensors at 100Hz, buffering them and sending them out after every 5 to 15 readings depending on network throughput. I also have the occasional blocking code for up to 600ms, where sensor data is still collected (using ticker) and then sending is resumed.

When using 2.3.0 this works perfectly and indefintely

When using the latest core, as soon as the 600ms block comes, the publish call does not return for 5 seconds, and then the esp8266 eventually becomes unstable and crashes, from my poor heap management.

The previous comment about utilising client.loop() after publishes etc... does not alleviate my problem in any way.

Would greatly welcome any pointers.

gjt211 commented 7 years ago

FWIW, I also have this problem and it is random. Memory usage is fine (always over 20kb free) and there doesn't seem to be anyway to trigger the disconnection to occur. I have now spent a few weeks looking at alternatives to this code base (Mongoose-os, microPython, and even Lua) as the reliability is not good enough for my needs. I have run test code in both microPython and Mongoose-os over the last couple of weeks and don't have this problem, so it's not the hardware.