peterhinch / micropython-mqtt

A 'resilient' asynchronous MQTT driver. Recovers from WiFi and broker outages.
MIT License
566 stars 121 forks source link

Interesting behaviour with MQTTv5 Topic Aliases #150

Open bobveringa opened 3 weeks ago

bobveringa commented 3 weeks ago

Just wanted to get your thoughts on the following behaviour.

We have an implementation of topic aliases. The consequences of sending an invalid topic alias are quite severe (immediate disconnect by the broker). So we take great care in ensuring we only send valid topic aliases. Recently, we have been investigating weird disconnects on some devices.

After some investigation we found that the resilient reconnect "causes" these disconnects.

    async def publish(self, topic, msg, retain=False, qos=0, properties=None):
        qos_check(qos)
        while 1:
            await self._connection()
            try:
                return await super().publish(topic, msg, retain, qos, properties)
            except OSError:
                pass
            self._reconnect()  # Broker or WiFi fail.

The following happens. Sent a message with a topic alias. Then the connection gets lost (either in this function or due to some other factors). This function will now patiently wait until there is a new connection. Then when there is a connection the message gets sent, but it has no topic, only a topic alias. This is invalid, and thus the broker terminates the connection.

We are looking at ways to solve this problem in our interfacing layer. Like creating tasks for each publish and canceling those on disconnect.

I was wondering if you had only thoughts on this matter. I don't think many people will use topic aliases, but this behaviour does make it more challenging to have a robust implementation of them.

peterhinch commented 3 weeks ago

I have now given this some thought. It clearly could be handled by the client, but it would require keeping a dict that related topics and aliases. The dict would persist over outages, but I think there would need to be an API for cancelling an alias so that the dict's growth could be limited.

I agree with your view that few people will use topic aliases. I think the principal use for these is on ultra low bandwidth channels such as Lora. Channel capacity is not an issue for typical WiFi/internet users. My concern is to minimise complexity and RAM. So let's assume that the application has responsibility for maintaining aliases.

I have pushed an update to the README to warn of this problem. If you achieve an application-level solution I could update the docs to provide an outline approach.

As a general comment, when writing libraries for others to use, it is hard to judge what features will be of most benefit. You get very little feedback unless something goes wrong. My approach is to use the libraries in my own projects - but this only takes you so far. Re V5 I think the killer feature is the provision of expiry intervals which fixes the mess that was clean_session=False. I'm not even sure many users will need properties: if you have control of all clients in a system you can just JSON-encode them into the message. But some people will need to connect to clients which they do not control.