halfgaar / FlashMQ

FlashMQ is a fast light-weight MQTT broker/server, designed to take good advantage of multi-CPU environments
https://www.flashmq.org/
Open Software License 3.0
173 stars 24 forks source link

Last Will not working correctly? #84

Open Head opened 4 months ago

Head commented 4 months ago

I think LWT is not working as it should in FlashMQ. I had some alerts in my uptime monitor that are not working after I changed my Server to FlashMQ.

I try to compare Mosquitto and FlashMQ with two shell windows. First I connect to my Mosquitto Server: On the 1st I do: mosquitto_sub --will-topic foobar_topic --will-payload crashed --will-retain -i foobar -t foobar_topic -v -h localhost --disable-clean-session no output yet. πŸ‘

On the 2nd I do: mosquitto_pub --will-topic foobar_topic --will-payload crashed --will-retain -i foobar -t foobar_topic -m online -h localhost -r --disable-clean-session now the 1st outputs: foobar_topic online πŸ‘

I only get the crashed LWT once I kill -9 the mosquitto_sub process.

Now I try my FlashMQ Server: On the 1st I do: mosquitto_sub --will-topic foobar_topic --will-payload crashed --will-retain -i foobar -t foobar_topic -v -h localhost --disable-clean-session instant output foobar_topic crashedπŸ‘Ž

On the 2nd I do: mosquitto_pub --will-topic foobar_topic --will-payload crashed --will-retain -i foobar -t foobar_topic -m online -h localhost -r --disable-clean-session now the 1st outputs: foobar_topic crashed πŸ‘Ž

Also I watch the topic with MQTT Explorer and once I pub, I get both topics, crashed and online afterwards. This is different from Mosquitto.

halfgaar commented 4 months ago

There is a lot to unpack here.

First, I see you connect with MQTT3. I'll already note these difference from the spec:

FlashMQ currently does not properly distinguish, and uses MQTT3 logic. This was recently discovered and still needs to be fixed.

Then, you publish with 'retain' on. This can muddle the waters, because the message will stick once published. This is probably the reason you see instant output on your first FlashMQ example. Can you be sure you start with an absolutely clean slate and no saved state? And did you actually intend to set a retained message? Note, this does not mean 'retain the will'. It means 'set a general retained message on this topic once the will is published'.

Then, you publish with a conflicting client id. So in your first example, publishing with mosquitto_pub kill kick out the existing mosquitto_sub. FlashMQ will send a will in that case. The effect of this is very subtle. You will see it instantly on a 3rd client, because it's published. But, your first mosquitto_sub will see it once it reconnects and resubscribes to the topic, because subscribing to a topic that has a retained value set, will get you the message.

Can you tell me if getting the 'crashed' message on your first client is about half of a second after you run the mosquitto_pub command, or is it instant? In my tests, it's half a second, and that means you're seeing the message as a retained value, as part of the reconnection + subscription as a result of being kicked out.

The specs differ in what to do in cases of conflicting client ID:

I think the MQTT3 behavior is actually correct, provided that they intended the will to be sent on kicking out a client with existing ID. The other confusion is probably caused by the 'retain' flag on your will messages.

Head commented 4 months ago

I don't know what MQTT3 and 5 even is. I'm absolutly not an MQTT expert. All I know is, that I've been using an mosquitto server on my raspberry and did connect two ESP8266 to it. One running "openDTU" and one with "Tasmota". Both did the LWT like expected. Now I've changed to the latest Venus OS, they recently switched to FlashMQ and now my ESPs are connected and the LWT is "offline". It's "online" when I reboot the ESP, but once there was a disconnect, it stays at "offline" forever. The top approach was just my try to google and debug it and I've found different responses on both Servers.

halfgaar commented 4 months ago

Ah, with that info I think I understand what your examples were trying to show, but I think that's impossible with mosquitto_sub and mosquitto_pub. You would need one client that both publishes and subscribes.

The fact that you set 'retain' and '--disable-clean-session` is that based on the real behavior of the ESP MQTT clients, like OpenDTU? Can you perhaps show its MQTT related config?

Can you also give me the literal on-line/off-line topics of OpenDTU and Tasmota? Are they the same or not?

And when you say 'and now my ESPs are connected and the LWT is "offline"'; where do you see that; in what client? A 3rd one?

Are you also willing to show flashmq logs?

Head commented 4 months ago

I've just copy&pasted it from https://github.com/eclipse/mosquitto/issues/1273 to be able to test LWT over the shell. OpenDTU's LWT code is here: https://github.com/search?q=repo%3Atbnobody%2FOpenDTU%20lwt&type=code

I use MQTT Explorer on a 3rd machine and connect to the FlashMQ. I see the topic is "offline". (not the case, it is connected) I restart openDTU while listening and see the topic set to offline, and immediately (0,04s) after to online. I disconnect MQTT Explorer and reconnect it: LWT is "offline" with retained flag.