victronenergy / venus

Victron Energy Unix/Linux OS
https://github.com/victronenergy/venus/wiki
580 stars 73 forks source link

Replace Mosquitto with FlashMQ #1098

Closed mpvader closed 5 months ago

mpvader commented 1 year ago

We're going to replace Mosquitto + dbus-mqtt by FlashMQ + dbus-flashmq.

The differences in performance:

Further details:

Architecture:

Note that there will be a transitionary period, because FlashMQ currently does not support broker to broker connections. During that period we'll use FlashMQ for all local websocket communication (HTML5-App and Gui-v2). And at same time still use dbus-mqtt plus Mosquitto for connection to/via VRM.

Note that the cluster in the cloud is now also Mosquitto, and will be replaced by FlashMQ as well.

MQTT spec. details:

Changes in apps with regards to keep alive and retain:

This is how UIs (gui-v2, html5-app, VRM) should now work:

1) subscribe to N/<portalid>/#

2) publish once to R/<portalid>/keepalive (lower case k). Empty payload.

2) thereafter, every 30 seconds, publish to R/<portalid>/keepalive again, with payload { "keepalive-options" : ["suppress-republish"] }

There are more details, such has handling disconnects, what to show to the user if the app doesn't get any MQTT updates anymore, reconnecting, and so forth: which is explained in the Google doc called Victron - designing a UI chapter MQTT.

How about retained messages?

For backward compatibility reasons, especially VRM and VictronConnect need attention. We can do this client-side by interpreting 'retained' messages depending on Venus capability bit, or we can maybe let the server deal with. Pending discussion.

[1] https://github.com/victronenergy/veutil/commit/d9d618cd2d6bdafc97e56fb537ecdfaa3819a4d1

You're encouraged to read the readme for the new FlashMQ plugin, for more details.

One extra note: the HTML5 app now uses the selective keep-alives. Which is a bit of a mistake made a long time ago; and this is the moment to correct that. The reason to not use it is that it causes unnecessary CPU load at GX device.

Todos for all local MQTT traffic:

Todos for MQTT traffic via VRM

How to test:

  1. Install Venus OS v3.10~3, or later
  2. in /etc/mosquitto/mosquitto.conf, change the websockets listener port from 9001 to something else.
  3. in /etc/flashmq/flashmq.conf change the websockets listener port from 9002 to 9001
  4. restart both (in the right sequence 🙂 ); and wait a bit after stopping Mosquitto, since it waits a while before the port becomes available.
mpvader commented 1 year ago

@wiebeytec I've updated above Further details section, please review.

mpvader commented 1 year ago

Venus OS v3.10~7 has been built just now. In it is:

By default, all still uses Mosquitto. For testing all local mqtt + websocket traffic its possible to use FlashMQ, see above How to test chapter for how to use FlashMQ.

Next steps for us are to change HTML5-App implementation as per new rules, as well as gui-v2, then test both in development. And if thats all good, maybe in two weeks, we'll build a v3.10 beta that uses FlashMQ for the local traffic out of the box.

mr-manuel commented 1 year ago

Would it be possible to add also username and password protection to FlashMQ, if the GUI is password protected? Like Node-RED?

This would be very useful in external/shared networks, where no network protection is possible.

mpvader commented 1 year ago

Hi yes for sure, thats on the list if things to work on before release.

mpvader commented 1 year ago

Per v3.10~9, the dbus-flashmq plugin was modified to only send all topics when explicitly asked for that.

Rather than sending it automatically when going from sleeping to kept-alive state.

This makes it more obvious to app developers that things are wrong if they don't also first request a full keep-alive.

dbus-flashmq commit is this one: https://github.com/victronenergy/dbus-flashmq/commit/55fa32a5ae1275682c1cc9e9499fd085b1528c93

mpvader commented 1 year ago

Per Venus OS beta version v3.10~12 we made local websocket requests be handled by flashmq rather than mosquitto.

See recent changes in meta-victronenergy for details, ie the v3.10~12 and preceding commits: https://github.com/victronenergy/meta-victronenergy/commits/master

mpvader commented 1 year ago

Per v3.10~19, its:

1883 - FlashMQ (this is the default port for non-encrypted MQTT) 1884 - Mosquitto (this is left here just for now, will be gone later) 8883 - Mosquitto (this is the default port for encrypted MQTT) 9001 - FlashMQ (this is the default websockets port, non encrypted)

Of above, 8883 is the last one that needs migrating to FlashMQ:

@wiebeytec, can you tell me what options to use in the flashmq config file?

https://github.com/victronenergy/meta-victronenergy/blob/master/meta-venus/recipes-connectivity/flashmq/flashmq/flashmq.conf

and Mosquitto:

https://github.com/victronenergy/meta-victronenergy/blob/master/meta-venus/recipes-connectivity/mosquitto/files/mosquitto.conf

mpvader commented 1 year ago

Correction, also @wiebeytec . When making FlashMQ handle mqtt trafic from port 1883, I overlooked that that would also change dbus-mqtt & mqtt-rpc over to FlashMQ. Which is too early. And rather than fixing that as well I just reverted it for now, per v3.10~20.

We'll better continue this once flashmq broker to broker connection is ready. And meanwhile, anyone that wants to test against FlashMQ wrt local MQTT traffic can use port 1884.

Per v3.10~20 it is:

1883 - Mosquitto (this is the default port for non-encrypted MQTT) 1884 - FlashMQ (allows testing FlashMQ + dbus-flashmq plugin) 8883 - Mosquitto (this is the default port for encrypted MQTT) 9001 - FlashMQ (this is the default websockets port, non encrypted, used by html5 app - amongst other things)

mr-manuel commented 1 year ago

Would it be possible to also make encrypted MQTT of FlashMQ available on port 8884 for testing? This way we can also make sure that this works.

wiebeytec commented 1 year ago

For FlashMQ TLS:

listen {
  protocol mqtt
  port 8884
  # The keys from /data/keys/mosquitto.key, but we are not root so can't read them.
  privkey /etc/flashmq/mosquitto.key
  fullchain /etc/flashmq/mosquitto.crt
}

Note the comment line. FlashMQ is started as flashmq, so can't read the /data/keys/mosquitto.key. Mosquitto is started as root and then drops privileges, so it could also read it when the key is owned by flashmq. However, currently that means making /data/keys owned by flashmq, which is weird, because the SSH host keys are there as well.

Another possibility is sym-linking to /data/keys/mosquitto.* and only making those files owned by flashmq (not the dir they are in).

We do need to keep using these certificates (as opposed to making new ones), because people have likely used the mosquitto.crt file in the calling clients as root CA (because it's self-signed).

mpvader commented 1 year ago

As discussed today - see link:

mpvader commented 1 year ago

We're preparing v3.10 for release within a few weeks. In that release, we'll be shipping Mosquitto, not FlashMQ. In other words, all changes so far will be taken out for that release.

FlashMQ will be included again in v3.20.

But first we'll keep FlashMQ in there for a few more days, to see if latest changes now fixes everything / have a good build for the HTML5 devs & Qinetic to work with.

mpvader commented 10 months ago

Update:

With above, its now end to end FlashMQ, including mqtt-rpc.

Remaining todos concern security wrt the local MQTT connection, and testing.

ps. in case of issues, we’ll switch the cloud back to Mosquitto.

mr-manuel commented 10 months ago

It would be interesting, if you can give us some insights on how much the CPU load is now reduced.

mpvader commented 8 months ago

What is needed for having FlashMQ in the official release of v3.20 is done. I'll move this issue to v3.30 for the completion things.

hanzoh commented 7 months ago

I updated from 3.13 to 3.21 three days ago and everything worked fine. I just noticed that I have not been getting any values on local MQTT anymore since 18:46. The bridge connection from Home Assistant is being repeatedly closed.

Checking the logs in /data/log/flashmq, I see the oldest entry only from 20:02 because there were so many message flushing the older ones out:

@4000000065de31d70f3e272c [2024-02-27 19:02:37.255] [ERROR] On signal 'PropertiesChanged' in dbus_handle_message: std::bad_alloc
@4000000065de31d7235b430c [2024-02-27 19:02:37.592] [NOTICE] Accepting connection from: address='192.168.1.12', transport='TCP/Non-SSL', fd=21
@4000000065de31d72361079c [2024-02-27 19:02:37.593] [NOTICE] Client '[Bridge ClientID='core-mosquitto.venus', username='', fd=21, keepalive=60s, transport='TCP/Non-SSL', address='192.168.1.12', prot=3.1.1, clean=0]' logged in successfully
@4000000065de31d723f56be4 [2024-02-27 19:02:37.603] [ERROR] Packet read/write error: std::bad_alloc. Removing client.
@4000000065de31d723f58b24 [2024-02-27 19:02:37.603] [NOTICE] Removing client '[Bridge ClientID='core-mosquitto.venus', username='', fd=21, keepalive=60s, transport='TCP/Non-SSL', address='192.168.1.12', prot=3.1.1, clean=0]'. Reason(s): std::bad_alloc

I disabled MQTT on LAN and reenabled it, afterwards the MQTT messages started working again.

wiebeytec commented 7 months ago

Can you try Venus 3.22, released yesterday? There was a memory issue with dbus-flashmq that was fixed in that release. It's very likely the issue you're experiencing.

hanzoh commented 7 months ago

Thanks, I did not see that there was another update. Have installed it yesterday and will monitor flashmq behavior.

ak68-hub commented 5 months ago

I can´t login with blank username/ password in Version 3.31-4 to my Cerbo-GX (with MQTT-Evplorer its perfect :))

This problem should be solved since 3.31-1 !?

Do I need any additional configurations ?

wiebeytec commented 5 months ago

I can´t login with blank username/ password in Version 3.31-4 to my Cerbo-GX (with MQTT-Evplorer its perfect :))

Are you saying that MQTT Explorer does or doesn't work? Are you talking about another client that doesn't work?

Since 3.31~4, empty usernames should be supported. When I do:

mosquitto_sub -d -t '#' -h <hostname> -u '' -P '' -V 5

or

mosquitto_sub -d -t '#' -h <hostname> -u '' -P '' -V mqttv311

It works.

Do you have output from your client and from /var/log/flashmq/current on the GX device?

ak68-hub commented 5 months ago

Thank you, Now, I CAN connect from Nodered to MQTT-host ... but no new virtual device is recognized [venus-os_dbus-mqtt-battery] I don´t know why ???

dirkjanfaber commented 5 months ago

Tested anonymous login via Node-RED both with the zero_byte_username_is_anonymous option set to true and false. Also tested this both locally and via a remote mqtt session from Node-RED. All 4 options function without problems (on systems running 3.31~4).

@ak68-hub : Not sure what you are trying to achieve. Lacking more context, I doubt that it is flashmq related and I suggest filing an issue on the repo for venus-os_dbus-mqtt-battery instead.

mr-manuel commented 5 months ago

He already opened a discussion https://github.com/mr-manuel/venus-os_dbus-mqtt-battery/discussions/21. That has nothing to do with FlashMQ nor the driver. Seems like a config issue of the user. You can hide this comments as offtopic.

ak68-hub commented 5 months ago

HI DirkJan, I was successful now, the mistake was the main-topic in the venus-os_dbus-mqtt-battery script (topic with "/" at the beginning :( ) Thank you Andreas

mpvader commented 5 months ago

Good that it works now @ak68-hub ! For any next issue, please stick to community. Our github issue trackers are not for support.

——

I’m closing this issue, all has been done, except for the security related work, which is handled in a different issue or tracker.

ogurevich commented 4 months ago

the flashmq dameon on venus OS 3.32 seems to accept any user and any password while anonymous is not allowed in the flashmq.conf

Is it possible to protect connection to MQTT with user/passwort ?

`mosquitto_password_file /etc/flashmq/mosquitto_passwordfile allow_anonymous false thread_count 1 plugin /usr/libexec/flashmq/libflashmq-dbus-plugin.so max_packet_size 16777216 client_max_write_buffer_size 4194304 expire_sessions_after_seconds 86400 include_dir /run/flashmq zero_byte_username_is_anonymous true

listen { protocol mqtt port 1883 allow_anonymous true }

listen { protocol mqtt port 8883 allow_anonymous false fullchain /data/keys/mosquitto.crt privkey /data/keys/mosquitto.key }

listen { protocol websockets port 9001 }`

mr-manuel commented 4 months ago

I think it would be better to post the issue here: https://github.com/halfgaar/FlashMQ/issues

ogurevich commented 4 months ago

I agree. Meanwhile, I’ve found a workaround solution for myself: I only enable the SSL listener on the Cerbo GX (port 8883) and protect it with an x509 client certificate.

wiebeytec commented 4 months ago

This would have been the correct place for the bug, had it been a bug. Adding allow_anonymous false doesn't do anything, because it's already the default. It's the code in /usr/libexec/flashmq/libflashmq-dbus-plugin.so that always approves the login.

But, you'll be happy to know that authentication support is coming. It will be a general password in Venus for network services, that will also be used for MQTT.

ogurevich commented 4 months ago

Thank you so much for the kind message. From my perspective, this functionality is very important and its prompt implementation is truly like a dream. :)

lumikcz commented 4 months ago

But, you'll be happy to know that authentication support is coming. It will be a general password in Venus for network services, that will also be used for MQTT.

Hi, If I understand correctly, this will be a solution for current workaround (or, at least what previously worked with mosquitto config) where we manually edited the authentication in config file over SSH? That would be great, since manual edits to config files don't persist when installing updates.

wiebeytec commented 4 months ago

Hi, If I understand correctly, this will be a solution for current workaround (or, at least what previously worked with mosquitto config) where we manually edited the authentication in config file over SSH? That would be great, since manual edits to config files don't persist when installing updates.

Correct.

lumikcz commented 2 months ago

Sorry to re-open this thread. Based on this closed issue: https://github.com/victronenergy/venus/issues/1138, I read it that this should be already working in Venus 3.40 released on July 17, but I haven't noticed it in the firmware. I assume the password-related feature has not been released in 3.40? Just want to understand when this can be expected to work :)

wiebeytec commented 2 months ago

1138 was closed because there are other tickets. And indeed, the feature has been pushed forward, probably to 3.50.

lumikcz commented 2 months ago

1138 was closed because there are other tickets. And indeed, the feature has been pushed forward, probably to 3.50.

Thanks for update on this, understood.