Trouble with Packet Identifiers

hutch commented 8 years ago

Background: I posted a question to stack overflow: http://stackoverflow.com/questions/39210834/do-any-mqtt-clients-actually-reuse-packet-identifiers

The packet identifier is required for certain MQTT control packets (http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/csprd02/mqtt-v3.1.1-csprd02.html#_Toc385349761). It's defined by the standard as a 16bit integer (presumably), and is generated by each client. The identifiers are reusable by the client after the acknowledgement packet is received. So the standard allows up to 64k in-flight messages.

rust-mq/mqtt3 implements the packet identifier as a u16, and we have this implementation:

#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct PacketIdentifier(pub u16);

impl PacketIdentifier {
    pub fn zero() -> PacketIdentifier {
        PacketIdentifier(0)
    }

    pub fn next(&self) -> PacketIdentifier {
        PacketIdentifier(self.0 + 1)
    }
}

Problem 1: next can be made to overflow and panic if more than 64k messages requiring a packet identifier is sent by a single client. (My very first test client panicked :-)

Problem 2: even if it didn't panic, each client would still be limited to a total of 64k messages requiring a packet identifier.

tekjar commented 8 years ago

@hutch

Problem 1: Did you test in release mode? As far as I know, overflows are panics in debug mode but 2's complement in release mode. But I agree that we shouldn't let it overflow even in debug mode (Which I'm planning to do in my client and test. So thanks for raising this :) )

Problem 2: I don't think reusing a pkid is a problem. Usually outgoing publishes in the queue can be bounded (with a limit not > u16) which is what I'm doing here. But yeah, an outgoing publish queue with messages containing same pkids might be problem if acks are not returned in sequence (say one of the acks got lost and wrong message being popped out from the queue)

hutch commented 8 years ago

Making problem 1 go away would be nice.

So in release mode the Packet Identifier wraps back to 0? I guess limiting the number of in-flight messages to < 64k is pretty reasonable, assuming the first message isn't getting stuck.

tekjar commented 8 years ago

So in release mode the Packet Identifier wraps back to 0?

Yup --> https://is.gd/lK0Qxo

I guess limiting the number of in-flight messages to < 64k is pretty reasonable, assuming the first message isn't getting stuck

Yeah. That would be the correct implementation. But are you sure that rust-mq does re-transmissons in case of unacked packets ?

hutch commented 8 years ago

Mosquitto re-transmits after some timeout, default is 20s I think (but I've not confirmed). But I'm not sure that what mosquitto does is relevant. It's more what the client does... I'd hope that with QoS of ExactlyOnce that the rust-mq publishing client would retry... how would it get a QoS of ExactlyOnce otherwise?

tekjar commented 8 years ago

@hutch

Mosquitto re-transmits after some timeout, default is 20s I think (but I've not confirmed)

Since I'm not aware of what you are trying to do, Assuming you are trying to publish 64k messages with the client, AFAIK, for QoS1, broker wouldn't know. Client should store all its outgoing QoS1 publishes and flush them only after it has received corresponding acks.

http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html#_Toc398718103

In case an ack from the broker got lost (due to reconnections or whatever reason), client should retransmit the same message.

tekjar commented 8 years ago

Ah didn't see your edit. As far as I have read the rust-mq code, QoS1 and QoS2 works as long as packets aren't lost. Otherwise rust-mq doesn't retry (but let me check the code again. I'm not so sure. @inre ??).

What happens if a QoS2 publish write failed? It's in the client queue now and Broker doesn't know about the publish. I think during reconnection, this should be retried.

inre commented 8 years ago

@kteza1 yes, you're right. In case of reconnection, all unacknowledged packets will be retried.

hutch commented 8 years ago

I've got a reasonably representative prototype working using QoS2 between N publishing rust-mq clients and M subscribing rust-mq clients using mosquitto as a broker. There are hundreds of thousands of messages flying around (if the prototype is compiled in release mode). I've not tried QoS1, but, since it's unlikely to be something I need, I'll not likely attempt it. In theory in the real application the majority of messages will be QoS0 but only if it turns out that any dropped messages are approximately uniformly distributed (which is unlikely I think).

So the only thing left is getting a development mode compilation to deal with more than 64k messages, and to possibly do something to ensure Packet Identifiers aren't somehow re-used. I'm okay with limitations on in-flight messages if that helps.

jensschroer commented 6 years ago

Just ran into this myself. If I understand the intention correctly, the counter should be circular -> go back to 0 when hitting the largest number. Since an overflow is handled as exception in DEBUG mode, why not replace the line: PacketIdentifier(self.0 + 1) with PacketIdentifier((self.0 + 1) % std::u16::MAX) ?

inre / rust-mq

Trouble with Packet Identifiers #14