When are messages lost?

tkellogg commented 9 years ago

There's always some chance for messages to be lost. For instance in MQTT QoS level 2, if the client never logs back on with the same client ID, messages will be lost. When will it happen with CoAP-PubSub?

mjkoster commented 9 years ago

Good question. Message loss and QoS are interesting concepts. I think we need to first look at CoAP to inform this, since we are starting with base CoAP protocols. If we use NON messages, there is a best effort class of service but it limits network congestion, sometimes enables better delivery on actual constrained networks. Using CON messages could induce retries, but will, subject to the underlying network, eventually deliver the messages within a gross connection timeout period. How do these map to QoS levels and delivery policies in MQTT? Is there some broker logic or client registration timeout we could add for QoS control?

Also these ultimately map to resources, so duplicate messages may need to be thought of a little differently. Are PUBLISH operations idempotent? What do we need to say about the client receiving the PUBLISH?

How do the use cases inform this issue?

tkellogg commented 9 years ago

Answering questions backward

In MQTT publishes from the same client are not idempotent but from different clients are. In other words, MQTT respects order on messages from the same publisher, but not globally.

It sounds like CoAP supports QoS 0 & 1. (This is good, QoS 2 was a massive oversight). Though NON messages are actually worse than QoS 0 since its not built on TCP. I think this hits the ideal target.

What I'm concerned about is fast publishes. If the CoAP-PubSub subscriber receives too fast, the broker will skip to the latest value, dropping intermediate messages.

This happens on the publisher side too. If the publisher publishes too fast, the broker might not acknowledge intermediate messages and skip to the latest, allowing for dropped messages.

So right now CoAP-PubSub has 2 QoS levels:

NON - at most once
CON - at most once, but more likely to arrive

I think this is great (no sarcasm) but it absolutely needs to allow for an "at least once" store-and-forward QoS level. I can't promote this protocol as functional unless it has this option.

The main use case is time series data. If the client needs the full stream of data and isn't so interested in just the latest message.

Realistically, I don't see hard at least once functionality as useful most of the time. But every now and then it's crucial. More often I think any sort of high throughput CoAP-PubSub broker will bridge to some other HTTP/2 based solution that doesn't drop messages. Still, the dropping of messages during publish bothers me. On Mar 7, 2015 8:26 AM, "mjkoster" notifications@github.com wrote:

Good question. Message loss and QoS are interesting concepts. I think we need to first look at CoAP to inform this, since we are starting with base CoAP protocols. If we use NON messages, there is a best effort class of service but it limits network congestion, sometimes enables better delivery on actual constrained networks. Using CON messages could induce retries, but will, subject to the underlying network, eventually deliver the messages within a gross connection timeout period. How do these map to QoS levels and delivery policies in MQTT? Is there some broker logic or client registration timeout we could add for QoS control?

Also these ultimately map to resources, so duplicate messages may need to be thought of a little differently. Are PUBLISH operations idempotent? What do we need to say about the client receiving the PUBLISH?

How do the use cases inform this issue?

— Reply to this email directly or view it on GitHub https://github.com/mjkoster/I-D/issues/3#issuecomment-77696439.

mjkoster commented 9 years ago

I think at-least-once delivery can be accommodated based on some clarifying assumptions:

There's no assumption that all of the subscribers will need to ACK each publish on a topic before the next one comes in.

Using confirmable PUBLISH messages from the broker to subscribers and allowing the broker to store messages in a FIFO for each subscriber should provide guaranteed at-least-once delivery. Any one subscriber's ACK delay won't affect the data availability to the others.

Publishers will need to insure they don't get ahead of the broker's ability to consume messages. Using confirmable messages to PUBLISH from clients to the broker should provide a way using ACKs for the broker to indicate when it's ready for more messages and prevent over-run and dropped messages.

Does this make sense?

tkellogg commented 9 years ago

Is this spec'd out yet? If so, I think it'll work.

"Using confirmable PUBLISH messages from the broker to subscribers and allowing the broker to store messages in a FIFO for each subscriber should provide guaranteed at-least-once delivery. "

Also, is using ACK for flow control spec'd out yet either? It's dirty but it could work. I'm just worried that it's a functional deviation from how CoAP implementations currently work. On Mar 7, 2015 9:23 PM, "mjkoster" notifications@github.com wrote:

I think at-least-once delivery can be accommodated based on some clarifying assumptions:

There's no assumption that all of the subscribers will ACK each publish on a topic before the next one comes in.

Using confirmable PUBLISH messages from the broker to subscribers and allowing the broker to store messages in a FIFO for each subscriber should provide guaranteed at-least-once delivery. Any one subscriber's ACK delay won't affect the data availability to the others.

Publishers will need to insure they don't get ahead of the broker's ability to consume messages. Using confirmable messages to PUBLISH from clients to the broker should provide a way using ACKs for the broker to indicate when it's ready for more messages and prevent over-run and dropped messages.

Does this make sense?

— Reply to this email directly or view it on GitHub https://github.com/mjkoster/I-D/issues/3#issuecomment-77732439.

mjkoster commented 9 years ago

As for the FIFO per subscriber, it's a question of recommendation vs. interface specification. We could recommend something as an example, but it shouldn't impact the definition of the interface.

As for using ACK for flow control, I think we should have some discussion. The current ACK scheme is to deal with transmission loss.

What is the case for dealing with a client or clients over-running the broker? Do we need flow control? How is it handled in MQTT?

tkellogg commented 9 years ago

The only problem (that I can think of) with having store-and-forward be recommendation instead of formal specification is reliability. When it's formal, I can just follow the protocol to know I'm getting the guarantees I want. If it's informal, I have to read the broker's documentation to understand the guarantees. The quality of documentation usually varies quite a bit between development teams, so I'd rather rely on specification.

MQTT doesn't deal with flow control very well. You can apply back pressure, but that has it's own costs. I've also seen some clients use UNSUBSCRIBE/SUBSCRIBE to as a binary switch to start & stop messages, but that also has problems. Ideally, there would be a command to ask for 100 messages (and the broker sends up to 100 messages). I'm not sure it's very important to have elaborate flow control in CoAP-PubSub. I think your ACK-for-more scheme is already better than what MQTT provides. One problem I can see is incurring more in-flight messages than necessary. This could be solved pretty easily by enforcing a rule that a single endpoint can only have 100 in-flight messages (that 100 number could be hard-coded into the spec or negotiated on connect; I'd recommend the latter).

--Tim

On Mon, Mar 9, 2015 at 7:52 PM, mjkoster notifications@github.com wrote:

As for the FIFO per subscriber, it's a question of recommendation vs. interface specification. We could recommend something as an example, but it shouldn't impact the definition of the interface.

As for using ACK for flow control, I think we should have some discussion. The current ACK scheme is to deal with transmission loss.

What is the case for dealing with a client or clients over-running the broker? Do we need flow control? How is it handled in MQTT?

— Reply to this email directly or view it on GitHub https://github.com/mjkoster/I-D/issues/3#issuecomment-77987069.

mjkoster commented 9 years ago

Hi Tim,

Now that the draft is submitted (deadline was yesterday) we’re discussing this and other issues on the IETF CoRE Mailing list: core@ietf.org

You can join the mailing list at https://datatracker.ietf.org/wg/core/charter/

Let me know if there is any problem with joining the list.

Cheers,

Michael

On Mar 9, 2015, at 9:27 PM, Tim Kellogg notifications@github.com wrote:

The only problem (that I can think of) with having store-and-forward be recommendation instead of formal specification is reliability. When it's formal, I can just follow the protocol to know I'm getting the guarantees I want. If it's informal, I have to read the broker's documentation to understand the guarantees. The quality of documentation usually varies quite a bit between development teams, so I'd rather rely on specification.

MQTT doesn't deal with flow control very well. You can apply back pressure, but that has it's own costs. I've also seen some clients use UNSUBSCRIBE/SUBSCRIBE to as a binary switch to start & stop messages, but that also has problems. Ideally, there would be a command to ask for 100 messages (and the broker sends up to 100 messages). I'm not sure it's very important to have elaborate flow control in CoAP-PubSub. I think your ACK-for-more scheme is already better than what MQTT provides. One problem I can see is incurring more in-flight messages than necessary. This could be solved pretty easily by enforcing a rule that a single endpoint can only have 100 in-flight messages (that 100 number could be hard-coded into the spec or negotiated on connect; I'd recommend the latter).

--Tim

On Mon, Mar 9, 2015 at 7:52 PM, mjkoster notifications@github.com wrote:

As for the FIFO per subscriber, it's a question of recommendation vs. interface specification. We could recommend something as an example, but it shouldn't impact the definition of the interface.

As for using ACK for flow control, I think we should have some discussion. The current ACK scheme is to deal with transmission loss.

What is the case for dealing with a client or clients over-running the broker? Do we need flow control? How is it handled in MQTT?

— Reply to this email directly or view it on GitHub https://github.com/mjkoster/I-D/issues/3#issuecomment-77987069.

— Reply to this email directly or view it on GitHub.

mjkoster commented 9 years ago

Carsten Bormann brought up the basic problem you did, which is that messages sometimes must be discarded, and the question is how to architect with this in mind. Where, and which messages are discarded, and how applications deal with it.

(further discussion on the CoRE mailing list)

mjkoster / I-D

When are messages lost? #3