rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
12.04k stars 3.9k forks source link

MQTT 5.0: support for shared subscriptions #8936

Open petersilva opened 1 year ago

petersilva commented 1 year ago

Is your feature request related to a problem? Please describe.

MQTTv5 is used in our app ( https://github.com/MetPX/Sarracenia ) exactly the same way as rabbitmq/amqp. We have multiple processes subscribe to the same AMQP queue so that multiple consumers can cooperate on a single subscription. We have added MQTT support and with other brokers (using mosquitto anr EMQx) we use shared subscriptions with a group_id.

We currently use dedicated mqtt brokers instead of rabbitmq for mqtt use cases. That works fine, so this is a "would be nice."

Describe the solution you'd like

as per MQTTv5 standard... have multiple process's subscriptions to:

$shared/client_id/....

work similarly to rabbitmq having multiple processes connecting to the same amqp queue.

Describe alternatives you've considered

We use mosquitto instead. works fine. We like rabbit (have >10 years experience with it.) and have a lot of rabbitmq integration. We are starting to deploy mosquitto, gaining experience and gradually building out mosquitto integration to augment/replace rabbitmq over time.

Additional context

opened because #2554 was closed with shared subscriptions omitted.

The implementation provided isn't usable for me.

ansd commented 1 year ago

@petersilva in https://github.com/rabbitmq/rabbitmq-server/issues/2554#issuecomment-1646601061 you wrote:

I already described a mapping earlier in the thread that should make it straight-forward, as far as I can tell... map shared subscription group names to AMQP queue names, and everything should "just work."

If implementing Shared Subscriptions in RabbitMQ is "straight-forward" for you, please contribute a Pull Request.

From what I can tell, implementing this feature is not straight-forward. "Just working" isn't good enough for RabbitMQ. We need to make sure that RabbitMQ scales with millions of clients.

As I wrote in the blog post:

Shared subscriptions will be added in a future RabbitMQ release. Although this feature maps nicely to a queue in RabbitMQ, shared subscriptions are part of the session state and some RabbitMQ database migrations are necessary to efficiently query shared subscriptions for a given MQTT client ID.

When an MQTT client reconnects and resumes its MQTT session, RabbitMQ needs to find out if that MQTT client previously declared any shared subscriptions, and if yes, resume consuming from these (shared subscription) queues. To find out if an MQTT client previously declared any shared subscriptions, a database query is necessary. This database query must not linearly (O(n)) scan millions of entries if there are millions of subscriptions (because there can be hundreds of clients connecting at the same time). Instead, that query should be more efficient (O(1) or O(log n)). See for example https://github.com/rabbitmq/rabbitmq-server/blob/08ac71798a82c640462b5da900f57dd9a5efe550/deps/rabbitmq_mqtt/src/rabbit_mqtt_processor.erl#L880-L888

Efficiently querying shared subscriptions in the database requires a database migration. Given that RabbitMQ's database / metadata store is about to be migrated from Mnesia to Khepri, I'd like to hold off with additional database migrations until the Mnesia to Khepri migration completed. Khepri will also provide projections.

From my point of view, Shared Subscriptions are therefore likely a post-Khepri feature.

michaelklishin commented 1 year ago

And "post-Khepri" means "post-4.0".

petersilva commented 1 year ago

I thought you were still just mapping to AMQP which the previously mentioned approach should have been doable, but I can see (from the March blog) you have done a lot of work to essentially build an MQTT broker within rabbit, which has much higher performance potential for sure, but I'm sure was a lot more work. In that context, mapping to AMQP data structures is irrelevant, however:

You can probably store both in the same or very similar data structures... if you can look up a client-id efficiently today for 1:1 subscriptions, it should not be that much more expensive to look up a shared one.

Another way of expressing it: It looks like... based on the provided code snippet, that destination queue is made for each client-id. I would think that one could make a similar queue for each shared-id... so the lookup yes, would be once for each share-id associated with as well as the clientid,.. so a few x slower, but not orders of magnitude.

That might be one way of doing it... I don't know if that's better or worse than waiting for khepri.