victronenergy / dbus-flashmq

Plugin for FlashMQ that interfaces between DBUS and MQTT.
MIT License
21 stars 3 forks source link

libflashmq-dbus-plugin interferes with LWT (Tasmota devices) #7

Open kenzodeluxe opened 1 week ago

kenzodeluxe commented 1 week ago

When VenusOS started using FlashMQ, it seemed like a good idea to transfer all MQTT related communication to VenusOS given that FlashMQ is supposed to a high-performance MQTT broker (and you ideally want only one in your local network). After doing the switch, myself and others (see https://github.com/halfgaar/FlashMQ/issues/84) started seeing sporadic issues with LWT state of their Tasmota devices which would happen to be "off" with no good reason. I started debugging this in order to try and figure out where the root cause was and what I needed to do to make it work. Ultimately, it boils down that the usage of libflashmq-dbus-plugin.so with FlashMQ made LWT messages break. I used the following test config to validate this (with /run/flashmq being empty) :

thread_count 1
#plugin /usr/libexec/flashmq/libflashmq-dbus-plugin.so
max_packet_size 16777216
client_max_write_buffer_size 4194304
expire_sessions_after_seconds 86400
#include_dir /run/flashmq
zero_byte_username_is_anonymous true
log_level debug
allow_anonymous true

listen {
  protocol mqtt
  port 1884
}

When pointing my Tasmota device to the VenusOS device on port 1884 (without the plugin) and running mosquitto_sub -h venus.local -p 1884 -t 'tele/#' -v, I would consistently see this:

$ mosquitto_sub -h venus.local -p 1884 -t 'tele/#' -v
tele/my-device/LWT Online
$ mosquitto_sub -h venus.local -p 1884 -t 'tele/#' -v
tele/my-device/LWT Online
$ mosquitto_sub -h venus.local -p 1884 -t 'tele/#' -v
tele/my-device/LWT Online

When uncommenting plugin /usr/libexec/flashmq/libflashmq-dbus-plugin.so and enabling the plugin, it would consistently look like this:

$ mosquitto_sub -h venus.local -p 1884 -t 'tele/#' -v
tele/my-device/LWT Online
$ mosquitto_sub -h venus.local -p 1884 -t 'tele/#' -v
tele/my-device/LWT Offline
$ mosquitto_sub -h venus.local -p 1884 -t 'tele/#' -v
tele/my-device/LWT Offline

This may be working as expected since FlashMQ with libflashmq-dbus-plugin.so might not have been considered to be an MQTT broker outside of VenusOS; however, it would be great to get this fixed so users don't have to setup bridges or consider other workarounds where that's not needed since they could be happy with a single MQTT broker instance in their local network - and that being FlashMQ on VenusOS :)

wiebeytec commented 1 week ago

The plugin removes the retain flag on incoming publishes:

https://github.com/victronenergy/dbus-flashmq/blob/4d534bc6724427472d40449af61f6e4451fb967d/src/flashmq-dbus-plugin.cpp#L80

Now that I look at it, that actually only has affect on non-LWT publishes, which is kind of an ommision. In any case, the effect would be that once 'offline' is set as retained message by an LWT, it will not go back to 'on-line' as retained message. Only on-line clients will see the 'on-line' message coming by. Is that the behavior you see?

You could perhaps also post the output of mosquitto_pub with the -d flag, so I can see the retain flags.

As for a fix, I'd have to give it some thought. The transition from retained to non-retained messages in Venus was quite complicated, in subtle ways.

kenzodeluxe commented 1 week ago

I ran a tcpdump for the Tasmota device, and it does set the retain flag for LWT as expected:

retain

What is the reason to remove it? Found https://github.com/victronenergy/dbus-flashmq?tab=readme-ov-file#1-no-more-retained-messages. Still, would it make sense to adjust the function to filter for Venus-internal topics only instead globally if this is a requirement internally?

Only on-line clients will see the 'on-line' message coming by. Is that the behavior you see?

Basically, yes - I'm not sure what happens in clients like Home Assistant, but maybe the MQTT reader thread restarts and therefore loses all LWT information for the topic/instances; since upon reconnection, LWT is either Offline (or doesn't exist, I've seen both now) it never knows the Tasmota device's status until after that device restarts/reconnects (and in the process, resends its LWT: Online message). Edit: In the meantime, I have worked around this issue by having the Tasmota devices send their MQTT message to another, newly setup local mosquitto instance and flashmq bridge all local topics to that instance as well. Home Assistant has been reconfigured to point to mosquitto instead.

wiebeytec commented 5 days ago

Still, would it make sense to adjust the function to filter for Venus-internal topics only instead globally if this is a requirement internally?

We came up with the same idea:

https://github.com/victronenergy/dbus-flashmq/commit/59550e56db9f609cd0e520d7c89a96f1c0f9913f

It will probably be in Venus 3.40~ testing builds soon. Ticket will be updated.