mqttjs / MQTT.js

The MQTT client for Node.js and the browser
Other
8.57k stars 1.42k forks source link

Joint Mqtt/Mosca feature; 'retained-delivered' event #292

Closed cefn closed 8 years ago

cefn commented 9 years ago

Thanks for a great library. It's opened up a lot of possibilities.

Depending on the situation, this is either a documentation request, a feature request or a request for help with workarounds for joint mosca/mqtt.js usage. In the very worst case, it could be a suggestion for future MQTT protocol versions.

I'm trying to establish how I can author an mqtt.js client (with mosca server side) which offers promises calling back when all the retained messages from a subtree of topics (wildcard subscription) have been fully notified to the client.

Knowing when all subtree data has arrived means that my client code can proceed knowing it has the full data model in place. This of course assumes they were published with the proper retention flag and qos to be guaranteed-delivery messages.

Is there a detectable event that indicates all retained messages for a given subscription have had their on("message" ...) calls. I can't see this documented. Is it implied by some other event?

If there is no such event...

cefn commented 9 years ago

Here's some discussion about the issue from irc://irc.freenode.net/mqtt. Hoping that the same is true of Mosca.

(19:14:40) The topic for #mqtt is: Anything mqtt | Read/update the wiki: https://github.com/mqtt/mqtt.github.io/wiki | Currently busiest during UK time | http://test.mosquitto.org/ | If you have a question, just ask it. | Patience is a virtue

(19:14:40) Topic for #mqtt set by ral at 19:41:16 on 19/03/15

(19:18:49) cefn: Is there any way to tell when the retained messages from a wildcard subscription have been delivered? E.g. https://github.com/mqttjs/MQTT.js/issues/292 Without this we will need lots of client-side logic to define application-level validity for subtrees, or a horrible timeout hack, whereas if there was a way to know that everything had been delivered we don't need any logic at all.

(19:27:00) cefn: A workaround I was wondering about was whether a unique (non-retained) message sent by the same client, and which falls under the subscription would always be delivered after the retained messages had been delivered. This would mean receiving the unique message could be implicitly used as a retained-messages-complete event.

(19:38:26) cefn: I think this exploits common practices in MQTT servers, though, concerning in-flight windows per topic, and isn't a protocol guarantee? E.g. after subscribing you send a zero-byte non-retained message on a topic, and when you receive it you know you will have already received any retained message on the same topic if one exists because it would have been delivered beforehand.

(20:54:55) ral: cefn: What you're describing is my understanding of how things should work.

(20:55:04) ral: All retained messages before any fresh ones.

(20:56:02) ral: It's what mosquitto_sub -R relies on.

(20:57:10) ral: I can't say for sure that it is what everything would do though.

(20:57:19) ral: Although it would be pretty bonkers to do anything else.

(21:04:31) cefn: ral: thanks! Would you expect I could subscribe to root/something/# then publish "" (empty string non-retained) to root/something and expect to receive all of the retained root/something# wildcarded topics before the empty string came back. Or alternatively is it only reliable on a per-topic basis, e.g. it guarantees a retained message at root/something is delivered first, but a retained message at root/something/hello could show up later?

(21:08:44) ral: cefn: I would expect every retained message for that subscription to be delivered before any fresh messages are delivered.

(21:08:57) ral: That's what mosquitto does at least.

(21:39:53) cefn: ral: continuing the implicit ordering question, is there something which wouldn't be broadcast to everyone, but which would have the same ordering 'guarantees'. Ideally all the subscribers to the topic wouldn't have to be bothered with empty-string messages all the time.

(21:40:19) cefn: Though I guess they could use the retained flag as a filter, it seems a bit wasteful

(21:41:27) ral: cefn: In mosquitto_sub I just wait for the first non-retained message.

(21:41:43) ral: That's a special case though, because it only subscribes once.

(21:41:52) ral: At the beginning I mean.

(21:42:09) ral: And would possibly break if you subscribed to multiple topics.

(21:49:38) cefn: I think I need to arrive at something which a client can initiate (e.g. they introduce the event to the broker and use the round-trip and implicit ordering to derive that all retained message events for a given subscription have now happened when the event comes back). However, I'm wondering what is the range of alternative events I can initiate? I'm currently working on the assumption that the event has to be a message sent to a subtopic within the relevant subscription. Is that fair, or is there a broader set of events (mqtt.js callbacks, strictly) which I could use for this job. This would be especially valuable to know if there's an event which meets the ordering 'guarantee' but doesn't pollute the subtopic itself with spurious messages, though I can discard them everywhere, that seems inelegant.

(21:51:11) ral: cefn: That's about the sum of your choices.

mcollina commented 9 years ago
  1. the MQTT protocol specifies no way to know that all retained messages have been delivered, so MQTT.js will not have one.
  2. assuming that the retained messages will be delivered before other messages is currently implementation specific, in particular this has some performance implications so different brokers might chose differently. Section 3.1.1 of the latest spec (3.1.1) does not address retained messages. If you want to send a patch to Mosca to implement this, I will be happy to review it and merge.
  3. I think that retained messages should be used on their own topics, and logically separate them. This clarify the behavior too, and then it is a lot easier to debug.
  4. Relying on non-algorithmic message ordering is almost always a bad thing in a distributed environment: it does not work (lamport), because bad stuff can always happen.
cefn commented 9 years ago

Thanks for getting back to me. Much appreciated.

Do you feel that any ordering choices are embodied in the current Mosca implementation which might be exploited (e.g. that message dispatch to a given client is in the order of message arrival at broker, that a later retained message from the same client is delivered after an earlier retained message, or anything else?).

If there's no ordering implicit at all then I'd definitely have to try and get under the skin of the server implementation and force the issue. I'm happy to commit to a single MQTT server implementation as we're deploying both clients and server to our own specification and we already have node in the mix, so Mosca is a tremendous candidate.

For orientation, the system I've built demonstrates a broker-held MQTT topic tree which corresponds to a client-side local JSON structure of nested objects (branches) or strings (leaves). The leaf property values correspond to retained messages where a native javascript assignment baker.stock.currantbuns = "5" triggers the publishing of an MQTT-retained message with topic baker/stock/currantbuns and a payload the string bytes for "5" notifying there's 5 of the stock item "currantbuns" in the bakers shop. The inverse is also true - received values from subscriptions enable you to remin in synchrony with remote clients despite distributed edits.

For this reason when a new client wants to participate and populate a local JSON copy of the baker/stock/# subtree to investigate stock levels, it can't be known in advance what the topics are. There may be currantbuns, but also baps, barms and lardycakes (local delicacies). The danger is that there's no moment after subscription to baker/stock/# at which you know that you've been notified of all branches below baker/stock, but for application-driven reasons there has to be some point at which you consider you have a complete snapshot of a tree, even if it's a few seconds old.

Only after all the retained messages have been delivered can the UI designer have faith in the consistency of the JSON structure which has been locally constructed from the retained messages sent. Certainly bad stuff can happen. But bad stuff is pretty much guaranteed from the strategies I might need to employ as workarounds. Unresponsive UI from arbitrary timeouts are definitely guaranteed, and assumptions that all data has arrived after some timeout (when it actually hasn't) could be worst of all. I'm not sure what my alternatives could be.

I suppose forking Mosca to offer up an explicit property served from $SYS indicating client queue status could be an option. When queued messages arrives at zero then you have a snapshot.

cefn commented 9 years ago

In the absence of any other ideas, I'm considering this strategy...

The Mqtt server will monitor 'publish' events to maintain an in-memory sorted list of all the currently published retained messages in the broker (is this already there, somewhere?).

When a 'subscription' event indicates a wildcard topic subscription from a client, the server will take a copy of all the retained messages which match the wildcard (should be a simple slice of the sorted array) and store it against the client and subscription.

When a 'delivered' event indicates that a retained message for a given topic has been ACKed or a retained message is 'unpublished' then that individual topic is removed from the client+subscription array. When the list is empty, a topic like /subscription/${clientId}/full/path/to/wildcarded/topic is sent a non-retained message to indicate delivery is complete.

Does that make any sense at all? Any considerations to note about how this might play with Mosca?

mcollina commented 9 years ago

@cefn my major concern there is performance. But yes, that can work. A probably simpler way is emitting an event into server from here https://github.com/mcollina/mosca/blob/master/lib/persistence/abstract.js#L82-L91, and then react to that event in your code.

cefn commented 9 years ago

That sounds like a very good place to start, and thanks for helping me to navigate to where the key events take place. I think you're suggesting I wire into the listener mechanisms already there for connect, subscribe etc. simply adding my own listener which makes huge sense.

However, to address your correct performance concerns about managing arrays of the full tree of topics I'm also looking into an alternative implementation, which is perhaps more MQTT-ish, works with mechanisms already anticipated for server specialisation, and which could be implemented as a smart MQTT client without modifying the server.

A $SYS/childkeys/ topic tree could be maintained as a tree of retained messages containing JSON key arrays mirroring child subtopics. For example, subscribing to $SYS/childkeys/baker/stock would trigger a notification of a retained message ["currantbuns","lardycakes","hot cross buns"] indicating that the populated topics below the main topic baker/stock are baker/stock/currantbuns AND baker/stock/lardycakes AND baker/stock/hot cross buns . A client could then subscribe to $SYS/childkeys/baker/stock/currantbuns and expect the retained message '[]' indicating no further children - a leaf topic and hence a string-typed javascript property. Finally the client could subscribe to baker/stock/currantbuns to receive the message '5' indicating the value of the JSON property.

Of course a could also subscribe to $SYS/childkeys/baker/stock/# in order to get a full snapshot of the subtree (a subtree subscription which can be confirmed for completeness, and hence Promis-compatible, by logic performed on its own payloads).

Multi-level wildcard subscription is probably more responsive than recursively subscribing by expanding single-level wildcard topics step-by-step as all events will arrive in a big stream without round-trips (assuming you can handle the asynchrony and out-of-order delivery).

In principle such a subtree could be maintained by some MQTT client (subscribing to some wildcard, potentially the root wildcard, and responding to the notification of any retained-status message by updating the keys tree).

Once notified of the $SYS/childkeys topic matching any branch in the JSON tree, clients can identify their progress through the recursive tree traversal as notifications come in. Assuming all retained subtopics come in as expected, they are in a position to declare traversal complete (with an error condition and potential resubscribe as a backstop for topics whose anticipated keys are not delivered within a timeout).

Of course all of this pushes design questions elsewhere, requiring some form of topic- and possibly wildcard- locking protocol (perhaps mounted at the $SYS/lock subtree) to be able to extend any ACID guarantees at all, but that's something for another day (and another thread). The scenarios we're anticipating this being used have an asymmetry between those writing (who will effectively own their subtree) and those reading, but the simplicity of evented javascript objects as a wrapper for coding both reading and writing clients is very appealing.