knolleary / pubsubclient

A client library for the Arduino Ethernet Shield that provides support for MQTT.
http://pubsubclient.knolleary.net/
MIT License
3.83k stars 1.47k forks source link

LastWill message delays #872

Open pablofr918 opened 3 years ago

pablofr918 commented 3 years ago

Hello! I'm doing a proyect in which I'm connecting my ESP32 to a broker, and I have set the connect() function like this:

client.connect(clientId.c_str(),mqtt_user.c_str(),mqtt_password.c_str(),LWTtopic.c_str(),2,true,"Offline")

so when the client disconnects, the broker can check the LWTtopic and see if it is offline. Then, when it does connect to broker, I send another message in the same topic that says "Online" so I can always check the topic to see if it is Online or Offline. The thing is, when I modify the WiFi net to which the ESP is connected, it sends the Offline, but it is not instantly, so the ESP reconnects to broker and sends the online. After the delay, the offline arrives and it is actually online, but the last message to arrive is offline. It can be "fixes" seting a delay(10000) before the client.publish("Online"), so the offline and the online arrive with the correct order, but the delay blocks another functions that I should not block.

Any idea how to solve this?

Thanks!!

Andrew1040 commented 3 years ago

Actually, this is not an issue with PubSubClient, but normal behavior of MQTT broker. This aligns with MQTT spec. LastWill is not published by your device, it is published by the MQTT broker on behalf of your device, when the broker decides that connection is lost. The key point here is that this is the broker’s decision wether to publish LastWill or not. It is not enough to just turn WiFi off on ESP32 or your router. The broker detects connection status by monitoring Keep-Alive messages from your device. Periodicity of Keep-Alive messages is configurable, but there may be limitations from the broker side. For example, AWS IoT Core specifies minimum Keep-Alive period to be 30 seconds. An example: let’s imagine you have Keep-Alive period set to 15 seconds (this the default value for this library). This means each 15 seconds your ESP32 sends a message to the broker that it is still online. When you turn the WiFi off, the broker knows nothing about it. The broker will wait for the next Keep-Alive for the next 15 seconds, and if the Keep-Alive will not arrive, only then the broker will publish LastWill on behalf of the ESP32. In other words, if you set keep-alive to n seconds, it may take the broker up to these n seconds to realize that device is not connected anymore. And foreseeing probable questions - no, it not a good idea to set Keep-Alive period to very low values like 1-2 seconds. This will create a data flood on the broker side and this will drain battery of your device very fast.

Second topic is why you see LastWill after reconnection. The reason is again MQTT specification. If your device reconnects and requests a clean session (and clean session is requested by this library by default) the broker must publish LastWill message form the previous session. To avoid publishing LastWill you may try to request to resume session, however this requires additional care. The reason why you see “Online” message first and LastWill after it is that MQTT does not guarantee you message order. When your device reconnects two events happen almost at the same time - MQTT broker needs to publish LastWill from previous session and your device publishes “Online” message. My recommendation here is to introduce session counter - a number which will let you know to which session the message belongs to. For example, you connect for the first time and induce “0” into LastWill payload as session counter. Then you cut the WiFi, ESP32 disconnects. When WiFi connection restores, ESP32 increments session counter, reconnects to the broker, sends “Online” and session counter of “1”. Then even if LastWill of previous session will be published after your “Online” session, you will be able to clearly see that this last will is outdated. Output will be something like:

  1. Online, 0
  2. Online, 1
  3. Offline, 0
ruem2802 commented 3 years ago

@Andrew1040 i have a issue same this topic, when Wifi is on but i unplug the network, client.connected() return true about 15 second , but i can't run other function, can't check input digitalRead(BUTTON_RESET); can't control led until I plug the network cable into the model (have a network connection)

Alekeep commented 5 months ago

@Andrew1040, I appreciate your clear explanation of this concern. The idea of a session counter to tie a last will message to a connection inspired a thought for a similar but alternate approach to address another scenario.

When the connected device reboots, unless specifically addressed, the count would begin each time at zero. Based solely on that counter, out of sequence connect/last will messages could easily bypass detection.

For ESP32 and ESP8266, esp_random(void) returns a true (not psudo-) random number once Wi-Fi or Bluetooth is enabled. Unless I'm missing something, using (part of) the random value returned rather than a counter, should virtually eliminate any conflict.