hsaturn / TinyMqtt

ESP 8266 / 32 / WROOM Small footprint Mqtt Broker and Client
GNU General Public License v3.0
183 stars 40 forks source link

client ceases to respond to topics after a short number of events #47

Closed Bazmundi closed 1 year ago

Bazmundi commented 1 year ago

I have two ESP32 widgets, one set as an AP and one as a STA. I can get the STA to connect to the AP and I appear to be able to publish a topic on the AP and the STA does receive topics (upwards of 5 but often less), then I get no further responses.

Code for AP

The AP is a WEMOS BAT with a foot pedal

Code for STA

The STA is a WEMOS D1 R32 that will have a UNO relay shield. It will drive an air pump and pneumatic "switch", the code aims to turn on pump then the pneumatic switch, when the pedal is pressed. The air stays on again until the pedal is pressed again, in which case the pneumatic switch is turned off then the pump is turned off.

Since the STA connects to the AP, and then I get through a few responses, I am uncertain whether this is, as yet, a bug in the TinyMQTT code. I will do a little more poking around, but if anyone can see something in the source code (for either AP or the STA) that jumps out at you, thanks ahead of time.

I have snipped out the relay code (in the STA) to leave:

void loop()
{
  pump_client.loop();
}

That seems to allow more topic responses to succeed, but I only get closer to 10 before it stops.

If I connect a terminal to the AP, it happily keeps reporting the foot press code (but that does not guarantee the messages are still going out to the broker). So, without understanding how the TinyMQTT code operates, whether it's the broker or client end amiss, it's a little hard to work out where to start debugging.

Basic design then is AP has a broker and then two clients, one to act as the pedal (to send peddle presses) and one as the proxy for the pump (to catch the pump connection).

I added a callback to the AP and subscribed to the pedal press topic. It does appear the broker stops sending out topics after a few have been sent out. That is, whether the client is on the AP, or on the STA, the messages seem to stop eventually.

The other behaviour seems to be, when using the STA to connect to the AP, if I reset the STA, the STA will sometimes respond to topics for a short time, then stop again.

Any suggestions please?

hsaturn commented 1 year ago

Hello

I may have reproduced a similar issue on my ESP8266. No more information yet.

Thanks for the report

Bazmundi commented 1 year ago

I'm making a video of the behaviour and will post soon.

Bazmundi commented 1 year ago

Okay an unlisted video, for your consumption, on youtube.

One thing I noticed after the video was made. When I reset the STA, it connects and mostly will send the connection topic. There are a few odd occasions when the connection topic was not sent to AP. Regardless, the pedal pressed topic would still go from AP to the STA, up to the point the pedal pressed topic stops.

hsaturn commented 1 year ago

Hi, I can keep my mind saying "It's an heartbeat problem". Even if at the last try, the client was disconnected about 15 seconds later. You may try with

define TINY_MQTT_DEBUG 1 -> in TinyMqtt.h !!

That's may give you a lot more information.

I'm re-writing part of the console example wich is great to explore problems.

Thanks for your video and your report.

hsaturn commented 1 year ago

In MqttClient::processMessage, I can see this at the end clientAlive(5) meaning that the client is considered alive for the next keep_alive value (10 in your case) + 5 seconds. This gives 15 seconds and I measured (mentally) 15s with your video). This may really not be a hazard !

Still working on your issue...

Bazmundi commented 1 year ago

That was something I thought might be the issue, the client is alive for the set time (not infinite). So, that might account for the connected message from the STA to AP not turning up subsequently, if the publishing client (on the STA)? As well as the pedal push from AP to STA stopping?

Is there anything else I can try to help?

Bazmundi commented 1 year ago

You also need a "Buy me a beer" button :)

hsaturn commented 1 year ago

Hmmmm.... The "Buy me a beer" .... I don't have this one !!! For sure it will help !

Bazmundi commented 1 year ago

I see I could call clientAlive? To add more life? ... DOH! Never mind, it's hidden in the “private” section.

hsaturn commented 1 year ago

I have modified the code so the debug is much more expressive. I'll commit this soon. The thing you could do is (once I've pushed my modifications)

I couldn't see any problem with clientAlive yet, but am pretty sure that the problem is in my side (not your client side)

Bazmundi commented 1 year ago

So ahead of you giving me the go ahead, I have tried compiling tinymqtt-test on my WEMOS D1 R32 and get an error?:

src/main.cpp: In static member function 'static void automatic::command(MqttClient*, std::__cxx11::string)':
src/main.cpp:315:8: error: 'compare' was not declared in this scope
    if (compare(s, "create"))
        ^~~~~~~

I am assuming "ESPmDNS" is pulled in from Arduino libraries, and that is not the source of the problem?

Bazmundi commented 1 year ago

So, I added

#define TINY_MQTT_DEBUG 1

b4 I called tinyMQTT.h in main.cpp

New vid, ... , not much to show :confused:

hsaturn commented 1 year ago

Sorry, I've spent my last day to create a new library (TinyConsole) that handles terms such as Mobaxterm etc. This because the Arduino IDE's serial monitor is too simple, no cursor, no prompt, no color, no way to address row/col etc.. Lib is finished and will be very soon available (not yet gitted). --> All this because I want to enhance a lot tinymqtttest which is very usefull to test all kind of things. Once done, I'll return back to your problem, do probably before tomorrow, the lib is finished, all I have left to do is to package it, and to integrate it in tinymqtt-test.ino

In particular I made tinymqtt-test talking one night long to MqttExplorer, with no problem at all.

hsaturn commented 1 year ago

The #define TINY_MQTT_DEBUG must be added in TinyMqtt.h for some reasons (arduino IDE's) But I've already have a lot of modifications in this file in order to have a more readable debug information.

.. to be continued soon ...

Bazmundi commented 1 year ago

Actually using platformIO and pulled library in via that. So, if I re-pull in via platformio again, am I expecting to see version greater than hsaturn/TinyMqtt@^0.8.0?

hsaturn commented 1 year ago

Current release is 0.9.5 !!! https://github.com/hsaturn/TinyMqtt/releases

Bazmundi commented 1 year ago

ACK

hsaturn commented 1 year ago

I've published TinyConsole, just waiting for it to be available now in the IDE. Once that done, I'll be able to replace junk code in tinymqtt-test.ino and go further.

Bazmundi commented 1 year ago

Okay, so I downloaded 0.9.5 onto my laptop, and into VSCODE under platformIO, turned on debugging and voila!

voila=debug video attached, errant behaviour the same (client life ends 15 odd seconds in)

hsaturn commented 1 year ago

Hi again A loooot of work has been done on debugging.

You could try this 1 - Download the latest release 0.9.6 (not yet released, but already tagged) 2 - #define TINY_MQTT_DEBUG 1 in TinyMqtt.h 3 - open the console example (https://github.com/hsaturn/TinyMqtt/tree/main/examples/tinymqtt-test) 4 - You'll see a lot of changes in log in mobaxterm.

The tinymqtt-test is a console like application that launches a broker immediately so you can see what happens if you connect to it.

There is a prompt so you can run this command

> every 133 view      -> allow to view interresting things, such as connected clients and the keep-alive values !

Run your client and see what happens.

You can spawn a internal client with this command:

> client c broker         -> an internal client with name 'c' connects to the internal broker
> every 10000 c.pub topic rnd(1000)     -> change a topic every 10 seconds
> help     -> other commands if you want to explore

Or you can connect with an external client (TinyMqtt or MQTT Explorer for example). -> I"ve never saw MQTT Explorer disconnection (60s keep alive).

I could see that internal client are not tested against their keep_alive value. Maybe this is the bug. Here is a screenshot of an internal client never checked: image MQTT Broker is the external client. As you can see the next alive check will occur at 1170906 and it is now 1127352, ka=60 (60 seconds). connected in green shows that this is an external client with a real connection.

Best regards.

Bazmundi commented 1 year ago

So, I am feeling a little dim 😖.

It occurred to me to use a third parameter for the client connect, to overwrite the 10 seconds. Now, maximum seconds (given the type) is 65535, which is better than 18 hours. So, since I am using this as a wireless pump control for a manual PnP, I am not expecting to spend 18 hours on the job, so I can happily get by.

EXCEPT, there still seems a timeout?

I got to 236 button presses, then the client stopped updating again.

I'll time that and get back to you.

Bazmundi commented 1 year ago

I'll do a build of STA with debug ON, and with TinyConsole compiled in.

Bazmundi commented 1 year ago

Build of STA with debug ON, and TinyConsole compiled in, finds STA in a reboot loop ???? So that is my app code with TinyConsole added.

As I had not compiled TinyMQTT-test onto the AP, I tried doing that.

First problem was eval(x) not in scope, which I fixed by simply moving onCommand(x) after eval(x).

Next problem, not so fixable.

src/main.cpp: In member function 'void Every::dump()':
src/main.cpp:424:3: error: expected primary-expression before 'auto'
   auto mill=millis();
   ^~~~
src/main.cpp:426:7: error: 'mill' was not declared in this scope
   if (mill > next)
       ^~~~

So, auto seems to be a problem, at least for the build environment under platformIO. I'll try building TinyMQTT-test using the Arduino IDE.

Bazmundi commented 1 year ago

So, I pulled everything for TinyMQTT-test into the Arduino IDE (v2.0.2). Baulks at auto, yellow et al ???

Am I running too far ahead of your updates?

C:\Users\pierc\OneDrive\Documents\Arduino\sketches\TinyConsole\TinyConsole.ino: In function 'void setup()':
C:\Users\pierc\OneDrive\Documents\Arduino\sketches\TinyConsole\TinyConsole.ino:72:14: error: 'yellow' was not declared in this scope
   Console << yellow
              ^~~~~~
C:\Users\pierc\OneDrive\Documents\Arduino\sketches\TinyConsole\TinyConsole.ino:72:14: note: suggested alternative: 'ftello'
   Console << yellow
              ^~~~~~
              ftello
C:\Users\pierc\OneDrive\Documents\Arduino\sketches\TinyConsole\TinyConsole.ino: In member function 'void Every::dump()':
C:\Users\pierc\OneDrive\Documents\Arduino\sketches\TinyConsole\TinyConsole.ino:424:3: error: expected primary-expression before 'auto'
   auto mill=millis();
   ^~~~
C:\Users\pierc\OneDrive\Documents\Arduino\sketches\TinyConsole\TinyConsole.ino:426:7: error: 'mill' was not declared in this scope
   if (mill > next)
       ^~~~
C:\Users\pierc\OneDrive\Documents\Arduino\sketches\TinyConsole\TinyConsole.ino:426:7: note: suggested alternative: 'kill'
   if (mill > next)
       ^~~~
       kill
C:\Users\pierc\OneDrive\Documents\Arduino\sketches\TinyConsole\TinyConsole.ino: In function 'void eval(std::__cxx11::string&)':
C:\Users\pierc\OneDrive\Documents\Arduino\sketches\TinyConsole\TinyConsole.ino:495:66: error: expected primary-expression before '<<' token
      Console << red << "Unable to find (" << s.c_str() << ")" << << white << endl;
                                                                  ^~
C:\Users\pierc\OneDrive\Documents\Arduino\sketches\TinyConsole\TinyConsole.ino: In function 'void loop()':
C:\Users\pierc\OneDrive\Documents\Arduino\sketches\TinyConsole\TinyConsole.ino:915:2: error: 'Console' is not a class, namespace, or enumeration
  Console::loop();
  ^~~~~~~

exit status 1

Compilation error: 'yellow' was not declared in this scope

I assume that is the yellow defined in TinyMQTT.cpp?

Bazmundi commented 1 year ago

So, as I worked out the reboot loop was with debug turned on and using Console. I turned debug OFF and got no reboot loop. I turned debug ON, but recoded to go back to using Serial (and not Console), and I had no reboot loop.

Bazmundi commented 1 year ago

So, of course the keep_alive is set to 4.25 minutes, if I set the third parameter of the remote client connect to 65535, because:

  msg.add(0x00);      // keep_alive
  msg.add((char)mqtt->keep_alive);

The bottom bits are 255, and 255 seconds is 4.25 minutes.

I changed the code to:

  // msg.add(0x00);      // keep_alive
  msg.add((char)(mqtt->keep_alive >> 8));
  msg.add((char)mqtt->keep_alive);

That now seems to run for quite a while. But this then suggests that the PINGREQ/PINGRES may not be working?

Bazmundi commented 1 year ago

So, with a little debugging hack in the PINGREQ/PINGRES sections of processMessage , with keep_live set to 60 seconds:

  1. I boot broker on AP.
  2. I boot client on STA.
  3. STA connects to AP and then I get a connect topic on AP from STA.
  4. I do nothing with pedal presses on AP for a 10 minutes, which means no pedal press responses on client STA for that time.
  5. I get PINGREQ/PINGRES every minute while not pressing pedal.
  6. After that 10 minutes of no activity with pedal, if I then press the pedal once, the next PINGREQ/PINGRES pair is the last I will get, and that seems associated with client no longer reacting to pedal press topics. That is, some reaction between user topics and housekeeping?
  7. On a separate run, if I wait 3 or for PINGREQ/PINGRESET, then start pressing the pedal every second, the system will eventually stop.

So it currently "seems" like it happily runs and keeps client alive, if we are not running through user defined topics at least.

Bazmundi commented 1 year ago

I would LUV an example of how to set the client ID on the remote client.

On the remote client I am using:

const char* BROKER = "192.168.4.1";
const uint16_t BROKER_PORT = 1883;

static MqttClient pump_client;

...

pump_client.connect(BROKER, BROKER_PORT,60);

What's the trick please? The std::string & is driving me batty :bat:

Bazmundi commented 1 year ago

So, I also suspect the two proxy clients on the AP running the broker, being a pedal client and a pump client, may be timing out. I have the pump client catching the heralding connection topic raised by the remote client. That works soon after boot of broker. But if I reset the remote client at a later point, I can get the pedal press topics from broker device to remote client, but I never again get a connected topic getting from remote client to broker device. That is, resetting the client works only to allow client to subscribe and respond to topics raised on the broker device. The topics the client publishes, however, do not get a reaction on the broker device, unless we reset the broker device.

MqttBroker broker(1883);
MqttClient peddle_client(&broker,"peddel client");
MqttClient pump_client(&broker,"pump client");

However, I am getting a raft of anonymous client alive messages (the clientID likely null):

ESP32 IP as soft AP: 192.168.4.1
keep alive from message processed on  set to: 60
clientAlive on  set to: 69425
clientAlive on  set to: 71926
--> pump client received peddle/pump/connected, 1
clientAlive on pump client set to: 41931
clientAlive on  set to: 71931
clientAlive on  set to: 71934

So, still trying to unweave that. Without clientID all around its tricky. The "keep alive" message is coming through from the remote client.

hsaturn commented 1 year ago

Thanks

I'm fixing that. Lot of other bugs with unstable version :-) I'm keeping dealing with reboot bug...

Francois Biot

Le sam. 3 déc. 2022 à 07:09, Asterion Daedalus @.***> a écrit :

So, of course the keep_alive is set to 4.25 minutes, because:

msg.add(0x00); // keep_alive msg.add((char)mqtt->keep_alive);

That is 255 seconds is 4.25 minutes.

I changed the code to:

// msg.add(0x00); // keep_alive msg.add((char)(mqtt->keep_alive >> 8)); msg.add((char)mqtt->keep_alive);

That now seems to run for quite a while. But suggests that the PINGREQ/PINGRES may not be working?

— Reply to this email directly, view it on GitHub https://github.com/hsaturn/TinyMqtt/issues/47#issuecomment-1336091236, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRCSG5KJYYP7O2ZX6I5JHTWLLPXZANCNFSM6AAAAAASMLXJ7Y . You are receiving this because you commented.Message ID: @.***>

hsaturn commented 1 year ago

Some bugs have been fixed, not released yet. You can clone the latest git version to give a try.

Let me know if that's working now, and many many thanks for the time you've spend. Also thanks for pointing a bug in the encoding of large keep alive values.

Bazmundi commented 1 year ago

Hey, don't thank me. You've done the lion's share. Happy to be dabbling, and it's helping me understand the innards of MQTT as well.

Bazmundi commented 1 year ago

Sorry, I am still getting reboot loop with Console. Oddly, I can no longer convert to Serial as the << operator now raises an error.

With debug OFF, and with keep alive at 60 seconds, still stops soon after first user message goes through.

Reboot of client is still not seeing the connect message on the broker device?

I'll double check.

hsaturn commented 1 year ago

The reboot is due to TinyConsole. The problem should be fixed with the latest version. As you answered 1 hour after, this is probably due to the fact that the release was not yet propagated. The latest is the 0.4.2 You were too fast :-)

Bazmundi commented 1 year ago

Mwahahahaha!

Too enthusiastic, you mean.

So, I did pick up that in tinyMQTT.cpp, in the call to MqttClient::connect in the #ifdef TINY_MQTT_ASYNC section, but not the else section, you have used the ka parameter?

#ifdef TINY_MQTT_ASYNC
  client->onData(onData, this);
  client->onConnect(onConnect, this);
  client->connect(broker.c_str(), port,ka);
#else
  if (client->connect(broker.c_str(), port))
  {
    debug("link established");
    onConnect(this, client);
  }
  else
  {
    debug("unable to connect.");
  }
#endif

I am then curious 1) why you wouldn't do the same from the second use of connect, and 2) might just be me, but is connect not void on return, so should it be used in a test?

RE difference in the two calls to connect, would that be because, for some reason, this is a problem for the client in one but not the other circumstance?

Although, I have to note that if I add that pesky third parameter into the second call to connect, my remote client will not send nor receive topics ?? As I find TINY_MQTT_ASYNC is not defined, then does it actually work if TINY_MQTT_ASYNC is defined?

Bazmundi commented 1 year ago

NoooooooooooooOOOOOOOOOO000000000OOOoooo!

So, funny thing.

I added tinyConsole to STA pump device (WEMOS D1 R32), the one with the reboot loop (originally), and the reboot loop is now gone (thank you very much) when debug is ON etc.

I had never bothered with tinyConsole on the AP pedal (WEMOSBAT aka WeMos WiFi and Bluetooth Battery), because it was not working on the STA pump. Since tinyConsole was now working with pump, I added tinyConsole to pedal. BUT, ... , pedal is in reboot loop - but pump is not.

So, stay tuned. BUT, I have essentially copied all your files into both projects and so damn sure it's the same code in both.

Bazmundi commented 1 year ago

Bazingah!

So, I thought to myself, why not?

I thought I would move the main.cpp from the AP (broker) to the STA (client), to see what's what.

The reboot loop comes over with the main.cpp. That is, it isn't that the pedal code is sitting on the wemosbat and the pump code is sitting on a wemos D1 R32.

I am using the Bounce2 library to, you guessed it, debounce the actual pedal input. I did rip that out of the pedal code at one stage, to take that out of the equation. I'll rip it out again, on the code running on the D1 R32 to see, ... , what's what.

Bazmundi commented 1 year ago

Blah! So, ripped out Debounce, no effect, still getting the reboot loop on pump device running pedal code.

On my second vodka so, ...

hsaturn commented 1 year ago

Indeed, Vodka usually is a great help to totally avoid reboots. Maybe you use thé wrong kind.

I was very busy on klive ans espdrbug projets this week. Back on your issue this afternoon (paris).

Maybe you could tell me the brand of your vodka so i could save some Time using the same 😁

Le mar. 6 déc. 2022, 12:44, Asterion Daedalus @.***> a écrit :

Blah! So, ripped out Debounce, no effect, still getting the reboot loop on pump device running pedal code.

On my second vodka so, ...

— Reply to this email directly, view it on GitHub https://github.com/hsaturn/TinyMqtt/issues/47#issuecomment-1339196910, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRCSG22O3C7EDIW2AK3AZDWL4RIJANCNFSM6AAAAAASMLXJ7Y . You are receiving this because you commented.Message ID: @.***>

Bazmundi commented 1 year ago

Whatya mean "brand"? Isn't there only Русский Cтандарт?

I'm going to make a vid of the behaviours, stay tuned. (the boards, not mine)

hsaturn commented 1 year ago

Hello

compare is defined there https://github.com/hsaturn/TinyMqtt/blob/main/examples/tinymqtt-test/tinymqtt-test.ino#L299 And there is no call to compare at line 315. So I guess you use an old version of tinymqtt-test. If you are on a release, I guess I have to fix that quickly. In the mean time, you may clone the latest version or git pull / git checkout main

Francois Biot

Le mar. 29 nov. 2022 à 09:01, Asterion Daedalus @.***> a écrit :

So ahead of you giving me the go ahead, I have tried compiling tinymqtt-test on my WEMOS D1 R32 and get an error?:

src/main.cpp: In static member function 'static void automatic::command(MqttClient*, std::__cxx11::string)': src/main.cpp:315:8: error: 'compare' was not declared in this scope if (compare(s, "create")) ^~~

I am assuming "ESPmDNS" is pulled in from Arduino libraries, and that is not the source of the problem?

— Reply to this email directly, view it on GitHub https://github.com/hsaturn/TinyMqtt/issues/47#issuecomment-1330229809, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRCSG5YSNB4QIRFEJRZFZLWKWZ5TANCNFSM6AAAAAASMLXJ7Y . You are receiving this because you commented.Message ID: @.***>

hsaturn commented 1 year ago

Hello, I think that the problem is fixed in 'alive' branch of TinyMqtt. I made many modifications on the code around time, and have added som unit test (with time simulation). I cannot yet release this version because

So we have to wait a little before I merge the 'alive' branch. But one still can pull it from github.

Best regards

hsaturn commented 1 year ago

I'm opening a separate issue for this

hsaturn commented 1 year ago

As the code evolved a lot since this issue that may be fixed, I'm opening a fresh issue and closing this one. (see https://github.com/hsaturn/TinyMqtt/issues/62)