Open bobveringa opened 1 year ago
In principle yes. When @kevinkk525 was involved we discussed this but never performed a detailed study of what was involved. Have you assessed the size of the problem in terms of effort and code size?
I do remember noting that V5 addresses the clean_init
problem - if a node is downed for a long period then powered up with clean=False
it can be overwhelmed with data. clean_init
was our ad hoc fix.
A possible technical issue is RAM usage on ESP8266.
For personal reasons the amount of time I have available is quite limited.
As of right now, we are yet to decide on the scope for this project, and are just in the exploratory phase. This means that we don't have an idea of code size, and I think I labeled the effort internally as "a lot".
There are still some key questions to answer, like:
So there is still a bit of work to do to figure all of that out. Any input you can provide on this is much appreciated.
I understand you have limited time available, I wouldn't ask you to start spending a lot of time on this topic. If you could provide some feedback from your lessons learned and other MicroPython knowledge during the development, that should (hopefully) be sufficient.
I would be glad to provide support as you describe.
Point 1. is clearly a major one which cannot be decided until there is a clearer view of the other points.
Re RAM use, mqtt_as
was initially developed for ESP8266 as the only viable WiFi enabled platform. It has insufficient RAM to compile the module, so it must either be precompiled or preferably frozen. RAM is more plentiful on more modern platforms but people still use ESP8266. One option might be to retain the current code as a "legacy" branch for ESP8266 while advancing the master for V5.
Okay, so I started this document out as a small, oh let's compare the 2 quickly and see what we get. And then well... I ended up with this. So, I'll summarize here and then have a very rough and general plan of what to do next.
It seems like it would be possible to support both 3.1.1 and v5 in the same class, with just a flag to indicate whether it is v5 or not. The reason for this is pretty simple, with the spec, they tried their best to keep things similar enough to make everything doable. So many changes are in the "variable header" of each packet type (both client to server and server to client). But all the v5 stuff comes after the 3.1.1 stuff. So this makes it fairly easy to add support to most methods. On the client processing side, we can however no longer rely on messages being a fixed size. And we will need to use the size attribute to read the entire packet, not that big of a deal either, the order of stuff hasn't changed.
The only thing that is really a lot different is the connect method. I think you can try to combine the functionality in a single method, but nobody would enjoy that. So this might have to be a method for 3.1.1 and v5 (or if we use inheritance, we can just override).
One thing is clear I think, a lot of stuff is now put as properties in the variable header. So, we need a method that can easily add these things in later. This is a bit difficult as these properties do change the length of the overall packet. For 3.1.1 this is not a problem as you only need to send a 0 byte.
So, after all that, here is my very rough, very general and basic plan for dealing with this:
As for support of the various features. Because most things are added in the variable header, if we come up with a clever smart way of exposing this to the user, we should be able to support most features, at least for publishing data. On the receiving side, I think the handler method would have to be changed to include some of these additional properties.
Based on how few brokers support the new Enhanced Authentication standard, I don't think we should worry about supporting it at this time. If this library is upgraded to the v5 standard, people who are advanced enough to be using those features can implement that themselves, and hopefully, contribute those changes back.
@peterhinch What do you think of this plan? You can also read my analysis below, maybe you have a different take than I do with how to proceed.
For this comparison, I used the following versions of the spec:
The goal of this analysis is to compare the specifications and see what changes need to be made to add support for MQTTv5.
Most of the MQTT control packet types have remained the same. With 2 notable changes:
DISCONNECT
packet type is now both Client to Server and Server to client, instead of only Client to Server. This means that support for this should be added to the client.AUTH
was introduced, this is both Client to Server and Server to Client. This is to support the newly introduced "Enhanced authentication" part of the MQTT specification. See section 4.12 of the v5 spec for more info.Most of the CONNECT
packet headers stays the same. They did introduce "CONNECT properties" which are encoded in the variable header. This is a breaking change, as you either need to include properties or specify that there are no properties (with a 0 byte).
The properties include:
There are also changes to the CONNECT
payload. Most of these changes boil down to adding the
same features, "publish" has to the will of the device. But this makes the payload different from the v3.1.1 spec in a difficult way, I think.
The MQTTv5 introduced some changes to this packet, they are known in the spec as "PUBLISH properties" and they are part of the variable header for the packet. This seems to be a breaking change, as you must either include publish properties or explicitly specify that there are no properties. The paho.mqtt implementation of this is here. The example is the v5 spec is in figure 3-9 in section 3.3.2.3
The "publish properties" include a lot of the new features of the spec:
An interesting thing to note is the way topic aliases are handled. They are not a separate action that is executed, instead when publishing a message, you can include a topic alias (2 byte integer). In subsequent messages, the topic can be omitted and only the alias needs to be provided. However, this means that to have a guarantee that a topic alias is set, you need to be transmitting at least with QoS =1. Another interesting part is how distributed brokers handle these situations. The broker we use (AWS IoT Core) is highly distributed and the order of messages is "best effort". Given that the consequences of sending an invalid topic alias are the immediate termination of the connection by the broker (section 3.3.4 PUBLISH Actions) best for clients to thread lightly when using this feature. Additionally, topic alias only apply for the duration of the NETWORK connection, not the session, so if you lose the connection the aliases get lost.
But I don't think we need to think about that too much with the implementation. This library can essentially just serve as a wrapper for the protocol, and allow the user to specify the topic alias without any further processing.
When processing PUBLISH
packets, there is no need to support topic aliases, and this can be indicated when connecting to the broker. So there should be only a little additional work on the processing side of this library (only the work in supporting publish properties).
The PUBACK
packet has been expanded to include 2 new features: Reason codes and "PUBACK properties". They are appended to the variable header after the packet ID MSB and LSB see section 3.4.2 PUBACK Variable Header.
We could add support for the reason codes, but I don't really have a clear picture of where this information would go (other than a debug print). And I don't think I can fully test most of these features, as the AWS IoT Core broker is not that forgiving with most of these packets. For example, when you are not authorized to publish, the broker just immediately terminates the connection. But adding a debug print could be useful, and wouldn't be too much work.
There is also a reason string (a full UTF-8 string with a reason) and User Property field(s). But I'd say that for now we can ignore these features and keep things simple.
As for work on the library, now after a PUBACK
only 2 bytes of the variable header are read (PID MSB and LSB). This needs to be modified so that the entire packet is read as it could now be variable due to properties that may, or may not, be included.
Much like the PUBLISH
packet, the SUBSCRIBE
packet was also expanded with additional properties, known in the spec as "SUBSCRIBE Properties" see section 3.8.2.1 SUBSCRIBE properties. These properties are, just like the "publish properties", appended at the end of the packet. This means that when subscribing as a v5 client, an additional byte needs to be sent that says that there are no properties.
The "subscribe properties" are:
Subscription Identifiers seem like they have a purpose, I think it when you pass this property, the publish packet the client receives might include this subscription identifier. You can then check the subscription identifier rather than trying to match the strings.
The v5 spec also introduces "Subscription Options" which are additional settings in what used to be the "Requested QoS" byte, see 3.8.3.1 Subscription Options.
The "subscription options" now contain the following settings:
Option 1 remains unchanged from previous versions. The retain as published might be useful, but given that it is just a single bit at subscribe time, doesn't hurt to implement it. Retain handling seems the most useful here. It allows the client to say what it wants to do with retained messages (0=normal, 1=send if not subscribed already, 2=don't send)
However, all the changes to the subscribe payload body are non-breaking. So leaving all these options at 0 is the same as current behavior.
With the SUBACK
packet is much the same story as PUBACK
They added new SUBACK
properties. But these can all be ignored. They re-used the reason code they were already using to indicate what QoS the broker subscribed with.
There are changes in the variable header, so this needs to be handled in the same way as the PUBACK
just making sure we read the entire packet, so things don't break.
The UNSUBSCRIBE
packet mostly stays unchanged. However, they did add support for "UNSUBSCRIBE properties", these similar to SUBSCRIBE
and PUBLISH
, I think there is something wrong with the spec as it doesn't include a figure that shows the variable header, and the figures jump from 3.28 to 3.30. But given the description present for the v5 spec, you just need to send an additional byte indicating that there are no properties.
The UNSUBACK
packet follows much the same story as the rest of the packets. They added User Properties, and reason codes. The packet does have payload now. But like other things, the PID is still in the first 2 bytes. The rest of the packet just needs to be read, but doesn't really need to be processed for an initial version.
No changes to these packets (Yay!)
The format of the DISCONNECT
packet has not been changed and is fully compatible. There are some extensions in v5. For example, it is now possible for both the client and server to specify a reason for disconnecting. The "normal disconnection" is reason 0x00, which is the same value that the 3.1.1 requires. The full list of reason codes is listed in section 3.14.2.1 of the spec.
The server can now also send this packet, so this does need to be implemented. But it is a simple packet, so that is doable.
The AUTH packet is a newly introduced packet in the v5 spec, it is part of the "Enhanced Authentication" introduced in v5, this article by HiveMQ is pretty good.
This packet does not need to be implemented unless we want to add support for enhanced authentication.
That is a very detailed review - it looks like a big job. A few broad-brush comments.
qos==2
?For the re-write (if this is something for a later date) I can start collecting data from the fleet of devices, we have deployed. Some are operating in truly horrific Wi-Fi conditions, which may produce interesting results. I intended to add this data collection anyway to our fleet just to help with support at the customer locations. So, if there is an interest in specific things that produce insights, I can see if it is possible to add those.
If there is interest in a re-write and I somehow find the time to properly look into that, I would like to help with the design and implementation. There are limitations we have run into with the current implementation, but they are minor enough that they don't warrant the engineering time spent on improving them. However, if it ever comes to a redesign, then we can allocate the required resources to investigate what these limitations are exactly, and maybe some possible solutions.
As for how to continue the MQTTv5 implementation. I think I have enough information on how to proceed at this time. I have some meetings next week that should (hopefully) provide me some insight into when there is enough space on the engineering calendar to start on the design and implementation.
Taking your points in order:
qos==2
.qos==2
ever being requested. I did consider implementing it, testing using a local broker (mosquitto), but I'm entirely happy with leaving it unsupported.I'd be interested to hear details of your application (if you're in a position to divulge).
Alright, good to know your position on these things.
I'm happy to talk about the details of our application. Just not publicly. Do you have somewhere to reach you privately, be it a teams/voice call or just e-mail?
I understand, if you don't want to give out your information publicly (I would also rather not), if you send an e-mail to info@smartfloor.com. Your contact details should get to me (just mention my name in the e-mail).
FYI you might want to see https://github.com/peterhinch/micropython-mqtt/issues/132
Just an update on this. I've been very busy with other, more pressing tasks, but it is likely I will start to work on this over the next few months.
Thanks for the update.
I got bored over the Easter weekend and decided to give the implementation a go. Turns out getting most of it to work was not even that difficult. However, actually doing something with the new properties that are returned is difficult. Which is why, for now, all the properties that come from the broker, are ignored.
My current assessment, is that it would be very difficult to do something with the properties returned while adhering to the other goals that this library has. But I'll also discuss this internally to see what the options are, maybe I am just overlooking something.
While what I have now works, it does not look pretty. I'll spend the next couple of days cleaning up the code a bit and finalizing some of the features. You can expect an initial PR somewhere this week (or next week at the latest).
One issue I mentioned earlier is the clean_init
arg discussed here in the docs. I provided this because I believed it to be necessary in a microcontroller context, although the way it is implemented is hacky. When Kevin and I discussed V5 we were relieved to see that the new version provided an official means of achieving this behaviour.
I would like to be able to remove the hack, ideally without breaking code that sets clean_init
. I would be grateful if you could give this some thought and suggest a good way forward (whenever you get some time, of course).
Ah, I totally forgot about it.
Ok, I think something can be done, but only for MQTTv5, as just by reading the spec, there is no way to get the same behavior without this hack on MQTTv3.1.1.
Looking into this, I did find an answer to another question I had. The question I had was why rename Clean Session
to Clean Start
.
In the MQTTv3.1.1 spec, it says:
After the disconnection of a Session that had CleanSession set to 0, the Server MUST store further QoS 1 and QoS 2...
But this line is omitted in the MQTTv5 spec. And the additional clarification makes it clear that the disconnect hack is no longer needed in MQTTv5.
The code could look something like this:
is_clean = self._clean
if not self._has_connected and self._clean_init and not self._clean:
is_clean = True
await self._connect(is_clean)
But it becomes considerably uglier to have both MQTTv5 support and MQTTv3.1.1 support in the same function, as you would need to do something like this.
is_clean = self._clean
if not self._has_connected and self._clean_init and not self._clean:
if self.mqttv5:
is_clean = True
else:
await self._connect(True) # Connect with clean session
try:
async with self.lock:
self._sock.write(b"\xe0\0") # Force disconnect but keep socket open
except OSError:
pass
self.dprint("Waiting for disconnect")
await asyncio.sleep(2) # Wait for broker to disconnect
self.dprint("About to reconnect with unclean session.")
await self._connect(is_clean)
Also, a bit of an update regarding the status of this project internally. We have an ongoing internal discussion about doing a complete ground up rebuild for MQTTv5 support. We are still unsure about it is because we like working with open-source software that is actually used, it is just way less risk as more people are using it and more people are finding bugs. And we feel like splitting the community with our own thing is also not beneficial. For now, we will continue our development of MQTTv5 here as a rebuild and redesign will take a lot more time, so we will try our best to get as many features as reasonable added here.
For the benefit of your internal discussions it's worth pointing out that achieving resilience took a lot of time. It involved tests including outing brokers and AP's, putting a running client into a Faraday cage, slowly moving a running client out of wireless range and then back in. Deliberately creating radio noise. And repeating on a variety of host hardware.
As an aside my original industrial experience was in radio hardware development. Some software engineers find it hard to grasp that WiFi is subject to the laws of physics rather than conforming to the magic of TCP/IP...
At some point I'd appreciate an overview of your results. How much extra code and RAM use is involved? Do you think it is necessary to maintain support for V3.1.1? If the following conditions can be met, maybe V3.1.1 can be abandoned:
For the benefit of your internal discussions it's worth pointing out that achieving resilience took a lot of time.
This is also a big part of the reason why we don't want to move away from this library. We consider the Wi-Fi to be some sort of magic transportation layer that is not bound by any rules (be they programming or physics), and that looking at it the wrong way can change the behavior.
At some point I'd appreciate an overview of your results. How much extra code and RAM use is involved?
For now, it is pretty minimal (apart from actually using properties as you need to allocate memory for it).
Do you think it is necessary to maintain support for V3.1.1?
From what I can tell from the various brokers, most brokers seem to support at least basic features of MQTTv5. If support for v3.1.1 were to be dropped, it should be clear and to users that when they try to connect with the v5 client to the v3.1.1 broker, why this is not working and what to do to fix it (either using a specific branch or something else).
According to the spec, this is the behavior on v3.1.1 brokers.
The Server MUST respond to the CONNECT Packet with a CONNACK return code 0x01 (unacceptable protocol level) and then disconnect the Client if the Protocol Level is not supported by the Server
I have opened a PR for MQTTv5 support #139 It has been a lot more busy than I anticipated, so I have not had the time for a full cleanup. But the general concepts are in place.
I am not 100% happy with what I have right now, but after discussing this internally, this seemed like the least bad way of doing it. But I am open to any feedback.
@peterhinch Just writing this as a quick update to this topic.
We have continued working on adding MQTTv5 support to this library. We haven't published our latest work yet, as we are still working on finalizing everything. From my initial draft a couple of months ago, a lot has changed and most MQTTv5 features are supported.
There are still some open items which we are hoping to resolve within the next month. The most important one is that with MQTTv5, the broker can now disconnect clients by sending a disconnect packet. This still causes some unexpected behavior when reconnecting.
Overall, we are making good progress. I'll update here once we feel confident in the performance and stability of MQTTv5 support.
@peterhinch I've just pushed the latest update of our work on MQTTv5, it is available in the PR (#139). I would be interested in hearing your thoughts on the current implementation.
If you want to see a demo, I would be happy to set that up. This may be useful to show our implementation of topic aliases. We can arrange the further details of that through e-mail, if that is something you are interested in.
I've looked at the code and it seems good.
In order to review it properly I need to refresh my memory about V5 - we last looked at it nearly five years ago. I'll do that over the next couple of days. Meanwhile it would help if you could clarify any aspects of the protocol which are unsupported.
Assuming I don't throw a spanner in the works do you see this work as being nearly complete?
Re a demo that sounds a good idea, perhaps next week when I've got re-acquainted with V5. There are likely to e topics we could usefully discuss.
The new "Enhanced authentication" part of the v5 spec is not supported. The "AUTH" packet is not implemented. It is possible to set the Authentication Method and Authentication Data properties when connecting because, in general, properties are supported. And it doesn't make any sense to explicitly not allow them.
The "Will properties" are not supported. Implementing it would not be that difficult, but it would blow up the scope for testing.
For the subscribe packet, as part of "subscription options" we have not added support for the new options, and still only support setting the QoS when subscribing. This means we do not support: Retain Handling, RAP (Retain as Published) and NL (No Local). Adding support for this, would not be difficult (it is just some flags) but we opted out of implementing them as it would require a lot of additional testing.
The MQTTv5 spec allows you to set some properties more than once, like the "User Property". We opted to use a dictionary for handling the properties, which means that it is not possible to set a property more than once. This also applies when decoding properties, if the same property is included multiple times, we only take the last one.
Not everything that is returned in the CONNACK
packet is exposed to the end user. This can be easily modified if it is required for new features.
I think this should cover all the unsupported features, but it is possible I might have missed something.
The last one is a bit on the edge of unsupported/not complete, this is something we can discuss. We currently don't do anything yet with the properties on received messages. There is still a TODO on line 807. The reason we have not yet implemented it yet is because we don't want to add a new argument to the callback or event, as that would be a breaking change.
There are several options available:
I think it options 1 is out of the question. Not supporting it, is not really an option because some MQTTv5 features depend on it, like "subscription identifiers". So that basically leaves options 2 and 4. This does need some additional testing after deciding, but not a significant amount. We already handle all the properties, it is just that they are only printed at the moment.
We are leaning towards option 2, as MQTTv5 is already a bit of a breaking change, so it would not be unreasonable to ask that if people change the major version of a protocol they are using, that they also change a handler. But we have no strong opinions on either 2 or 4.
We decided to go with a fairly unopinionated implementation for some MQTTv5 features. Take for example topic aliases. This must be done by adding a topic alias as a property when publishing. In subsequent messages, you can then only include the topic alias property and omit the topic itself. In our current implementation, it is up to the application how to implement this. The library itself allows you to send the property, but you can't simply check something saying "use topic aliases" and have the library manage everything.
This is because, after doing some research, we found that there are just too many different use cases and edge cases that everyone would probably like to handle differently. The biggest problem with topic alias, for example, is that if you send a topic alias that is not valid, meaning that if the server has not received a topic to go with the alias before, the connection will be terminated. In our case, we are happy sending some messages with QoS 1, to ensure that messages that include both the topic and the alias arrive. But there are likely many people who want to make a different compromise.
It would say the work is nearly completed, there are some things to still review and receive feedback on, but I am not expecting major changes. We are still running validation checks internally, but we have devices that have been running some version of MQTTv5 for 2–3 months now, and everything has been very stable (apart from protocols errors in our application layer, which was why we found #147)
Thanks for that detailed response.
My initial response to "uncompleted features" above is as follows. Looking at the code, the user is able to choose the protocol. I assume that, if V5 is not selected, compatibility with the current version is 100%. Consequently, a user wishing to update the client but run existing code has a way to do so.
Consequently, if an application opts for V5, I think it is reasonable to introduce breaking changes.
If V5 introduces breaking changes, one question is whether, in that version, to remove support for clean_init
. It's rather a hack to work round what I see as a weakness in the V3 protocol. That weakness has gone in V5 with session_expiry_interval
so I'd happily see it removed.
I'll read up on MQTT5 with emphasis on the supported features, review the code, and report back in a few days.
Compatibility with the current version should be 100%, we are just starting re-validation of this, as we tested it a while ago but made some changes since then. I am not expecting that anything broke since we last tested it, but you never know.
I'll start the implementation of passing the properties to the handlers only for MQTTv5.
I've updated my local broker and done some work with mosquitto_pub
and mosquittp_sub
to re-acquaint myself with properties. Is https://github.com/bobveringa/micropython-mqtt/commit/63f20e716e7cd1b519552bc825965af801f15b26 essentially complete? If so I'll fetch the code and do some testing.
Yes, I'd say that version is essentially complete. I'm not expecting to make any additional changes (unless we find something during this review process).
I've grabbed the code. I was under the impression that existing applications would run unchanged, so my first test was to run the range_ex
demo on an ESP8266, also on the ESP32 reference board (one aim being to measure the extra RAM usage on ESP8266).
In both cases it failed to connect to the broker (below is ESP32 output):
Connected to MicroPython at /dev/ttyUSB0
Use Ctrl-] or Ctrl-x to exit this shell
>>> import range_ex
Checking WiFi integrity.
Got reliable connection
Connecting to broker.
Connection failed.
>>>
By contrast with the old version of mqtt_as.py
on the same ESP32 board with no other changes:
>>> import range_ex
Checking WiFi integrity.
Got reliable connection
Connecting to broker.
Connected to broker.
We are connected to broker.
publish 0
Please could you indicate what changes are required to enable existing applications to run.
This not completely expected. We have run a number of tests with MQTTv5 but with all our old configurations, and everything works.
Could you give some more information about the setup you are running, I'll investigate what is going on.
I did a code analysis in addition to some other testing. When a last will was set, it was also adding will properties, this messed up the entire encoding of the packet (for 3.1.1) thus resulting in being unable to connect.
This somehow got by everyone and managed to stay in there for several code reviews.
We don't use 3.1.1 in combination with last will, so it was never discovered during testing.
https://github.com/peterhinch/micropython-mqtt/pull/139/commits/7d7743134254bc757474ae9f48a26519993929d3 adds a fix for this issue. I've analyzed the rest of the code, and it looks like this was the only instance of this happening.
Thank you, that fixes the problem.
On an ESP8266 free RAM is as follows (using precompiled mqtt_as.py, running range_ex.py). This was with no code changes, simply replacing mqtt_as.mpy
.
Current version: 14752 bytes.
New version: 10416 bytes. Delta = 4336 bytes.
I then modified range_ex.py
to specify mqttv5
config["mqttv5"] = True
and adapted line 54 to read
async for topic, msg, retained, properties in client.queue:
The demo ran OK until I hit the broker with
$ mosquitto_pub -h 192.168.0.10 -t foo_topic -q 1 -m "hello hello" -D PUBLISH user-property foo bar -V 5
The following occurred:
[correct tracebacks removed]
publish 18
publish 19
publish 20
Task exception wasn't retrieved
future: <Task> coro= <generator object '_handle_msg' at 3ffd49d0>
Traceback (most recent call last):
File "asyncio/core.py", line 1, in run_until_complete
File "mqtt_as.py", line 967, in _handle_msg
File "mqtt_as.py", line 810, in wait_msg
File "mqtt_as.py", line 266, in decode_properties
ValueError: Unknown property identifier: 0
publish 21
Note that a mosquitto_sub session responded correctly to the publication:
$ mosquitto_sub -h 192.168.0.10 -t foo_topic -V 5 -F "Properties %P payload %p"
Properties foo:bar payload hello hello
I have no doubt I'm missing something here - I'd be grateful for some guidance.
You seem to be pressing just where it hurts (which is a good thing) and seem to have uncovered some gaps in our testing.
It seems like the decode properties table used the wrong decode method for user properties, it was marked as a string, but it is actually a string pair. The latest commit fixes this.
Some refactoring seems to have caused the wrong method to be used. My apologies, this should not have made it in the released version.
I've validated that all the other properties are using the correct decoding methods.
To throw in a curveball I've been thinking about RAM usage. The overhead when running MQTT V3.1.1 applications is rather high, notably for Pico W and especially for ESP8266. Users wanting V5 are probably best advised to avoid minimal-RAM platforms where the overhead is entirely OK in my view. But there are many ESP8266 users, and users who have no need for V5.
I can see two options, but there may be others.
format_properties
to the start of the base class, in a second file. The base class constructor only imports the file if mqttv5
is set. A wildcard import ensures minimal (no?) code changes.Unless you have another alternative I think 2. is worth an experiment. I'm willing to give this a try, measuring the outcome, if you think it's a reasonable approach.
Option 2 seems like a reasonable approach. I agree that people working on memory constraint platforms should not be impacted by a change this big. Based on the changes, it seems reasonable to assume that the bulk of this additional memory usage is coming from property decoding.
Just to confirm, you mentioned the following:
Current version: 14752 bytes. New version: 10416 bytes. Delta = 4336 bytes.
I'm going to assume that these should be flipped around, and that the current (released) version is the one using the least memory. Even though it would be nice if this new release somehow used less memory 😄.
The latest update has fixed received messages with user properties. I'm a bit stumped by the syntax required for the properties
arg of the publish command. By symmetry with the received properties I'm using
await client.publish(
TOPIC,
s.format(n, client.REPUB_COUNT, outages, rssi, m, client.queue.discards),
qos=1,
properties={38: {b"abc": b"def"}},
)
but am getting
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "range_ex.py", line 121, in <module>
File "asyncio/core.py", line 1, in run
File "asyncio/core.py", line 1, in run_until_complete
File "asyncio/core.py", line 1, in run_until_complete
File "range_ex.py", line 101, in main
File "mqtt_as.py", line 1089, in publish
File "mqtt_as.py", line 613, in publish
File "mqtt_as.py", line 638, in _publish
File "mqtt_as.py", line 171, in format_properties
NotImplementedError: array/bytes required on right side
Please advise as to the correct syntax.
There is no encoding method for properties. When sending properties, it must be formatted according to the spec. In the case of user properties, this means the following.
properties = {
38: b'\x00\x03abc\x00\x03def',
}
If the property is a string, then it must be formatted according to the spec (16 bit int for length, followed by the string)
I've deleted my previous comment, as after some more research it did not describe everything correctly.
As a correction, if a property is a UTF-8 String, then all that is needed to send it as send it as a "normal" python string. Our expectation at the time was that, this should cover the annoying bits of sending properties. As other properties are mostly fairly trivial.
Looking at the driver in a more complete state now, you could make an argument that users should not need to have knowledge of how the spec requires certain properties to be formatted, or at least not to the degree required now.
However, implementing property encoding is a bit more difficult, as a lot has to be known about them. We really want to prevent heap fragmentation and memory reallocation if at all possible. As we have noticed that this has significant performance and runtime impacts on our application. This is also the reason why format_properties
looks that way that it does.
It is possible to implement a more general encode function that would require less knowledge from the user about how to format the properties. However, it would be a lot more complex to implement.
So, this then turns into 2 questions:
If I had to answer those questions myself.
I would like to hear your thoughts on the matter.
Just to confirm, you mentioned the following: Current version: 14752 bytes. New version: 10416 bytes. Delta = 4336 bytes.
The perils of written communication :) I should have clarified those are measures of free RAM while running range_ex
. The ESP8266 requires precompilation even with the release version. Frozen bytecode gives a substantial saving, but many users are not equipped to do this. Hence my concern.
I'll think about property encoding.
[EDIT] I have hacked a version with the property code in a separate file. Performing the same test on an ESP8266 the free RAM is 13824 bytes. This is with the two files pre-compiled. The overhead for running old code on the new version is now only 928 bytes which is OK in my opinion.
Good to know. I'll make that change in the PR.
While working in the garden, I also thought of a way to better encode properties so that it is more user-friendly. Based on some mental programming, I think it doesn't cause any more memory re-allocation or fragmentation than the current solution. But programs do tend to run better in my head than on my hardware, so we'll see how that goes.
I'll also make those changes, test them to see if I didn't break anything, and I'll update the syntax here when I've pushed the new version.
I had to fight a bit with including the separate file, so I made it a relative import, I think in that case it should work for everyone, if they have it in a separate folder or not. If that is not the case, I have to think of something better to include it.
As for how properties are now encoded. You still need to know the spec, but it is not as relevant as it was before. Now integers can just be passed as regular integers, regardless if they are encoded as variable length int, 2 byte or 4 byte int.
Single byte properties still need to be passed as such.
Binary data, has to be a bytearray. But the 16-bit length at the front is automatically added when encoding. UTF-8 Strings can be passed as regular strings. UTF-8 String pairs are a dictionary of strings.
properties = {
0x26: {'value': 'test'},
0x09: b'correlation_data',
0x08: 'response_topic',
0x02: 60,
}
I made it a relative import
That sounds fine, but see my comment to the PR.
The new way to specify properties is much more intuitive.
A question remains, what to do about incoming properties. From an API point of view it makes sense for decoded properties to be returned, e.g.
properties = await client.subscribe("foo_topic", 1)
that way the caller can process them or ignore them. In the case of connect
this is easy. The following is a broad-brush, untested, idea of how we might achieve this.
In the case of publish
and subscribe
there can be repeated attempts - I guess the properties of the final, successful, attempt should be returned. This could be done with a bound dict .pend_props
whose key is the PID and the value the decoded properties. This holds the properties in the interval between the incoming message being received and the calling task being re-scheduled.
The .wait_msg
would assign to the dict and ._await_pid
would retrieve the properties. Under normal conditions the dict would hold only one entry, being removed when the calling routine got a successful result from ._await_pid
or on failure. Multiple entries could briefly occur with overlapping concurrent operations.
async def _await_pid(self, pid):
t = ticks_ms()
while pid in self.rcv_pids: # local copy
if self._timeout(t) or not self.isconnected():
self.rcv_pids.discard(pid) # This should have been in my code
del self.pend_props[pid] # Discard old properties
break # Must repub or bail out
await asyncio.sleep_ms(100)
else:
return self.pend_props.pop(pid, True) # PID received. All done.
return False
with .wait_msg
having code like
async def wait_msg(self):
# ...
if puback_props_sz > 0:
puback_props = await self._as_read(puback_props_sz)
#decoded_props = decode_properties(puback_props, puback_props_sz)
#self.dprint("PUBACK properties %s", decoded_props)
self.pend_props[pid] = decode_properties(puback_props, puback_props_sz)
Re DISCONNECT
In this case there is no PID since DISCONNECT can originate from the app or from the broker. I think the best approach is a bound variable disconnect_properties
set to None
in the constructor and also on connect. A disconnection caused by the broker will be trapped by the existing mechanism; the application's down
task could read this bound variable.
Re UNSUBSCRIBE
Currently we ignore the UNSUBACK
packet (0xB0). According to the spec it can carry the user property (0x26) but I've no idea whether brokers make a habit of this and what its contents would mean. There is a reason code. I don't know whether we need to deal with this, but given that there is a PID the property could be handled as per publish
and subscribe
above.
I'd be keen to hear your views.
I don't complete understand what you mean by supporting decoded properties for "PUBLISH". Incoming properties for "PUBLISH" are already supported as discussed here https://github.com/peterhinch/micropython-mqtt/issues/127#issuecomment-2247864055
Properties on 2 other operations "CONNACK" and "DISCONNECT" or sort of supported. With connack the relevant properties are taken out and made available to users. And with the disconnect the reason string is printed for easier debugging.
As for my thoughts on supporting properties on these other operations. I am not convinced that is something we have to do. The properties that these operations have are not really relevant in my opinion. It seems to mostly consist of 2 properties. Reason String and User Property. In my opinion, the benefit that you can get from returning these properties does not outweigh the complexity that would be introduced with a change like this.
While working on MQTTv5 support together with my colleague, with had a lot of internal discussions about what to do with properties. Looking around at other libraries and code that is publicly available, people do not seem to care about properties on these other operations. Bigger MQTT libraries like paho.mqtt, also don't seem to expose properties on these other operations.
In my opinion, supporting properties on incoming PUBLISH packets is what most if not all people are actually interested in. Supporting and exposing some properties on CONNACK is needed for the library to function correctly. The rest is all a nice to have.
I appreciate that properties on incoming messages are supported.
I was querying those cases which currently produce only debug prints (PUBACK
, UNSUBACK
,DISCONNECT
, CONNACK
). However I take your point that there are good reasons for not attempting full support, and on further thought there are edge cases that make doing it nontrivial.
A summary from our call yesterday (again because I posted the other one with my wrong account)
The remaining work consists of adding documentation for the MQTTv5 implementation. This should include some examples and a list of the features that are not supported. I will add this documentation and submit it with the existing PR.
We decided against returning properties for operations like PUBACK, SUBACK, DISCONNECT, etc. because it is unclear if there is any desire for this. We determined that if there is a need to implement this later, it can be done without any breaking changes to the API.
@peterhinch I see #139 was just merged. That is a bit of a problem. My colleague just tried out the latest version and got the following error.
File "lib/mqtt_as/mqtt_as.py", line 795, in connect
File "lib/mqtt_as/mqtt_as.py", line 317, in _connect
NameError: name 'encode_properties' isn't defined
I have no idea how because both you and I tested this method of importing properties from an external file. But yes, it is now happening consistently.
I have a "solution". I could open another PR for this?
encode_properties = None
decode_properties = None
class MQTT_base:
def __init__(self, config):
...
if self.mqttv5:
global encode_properties, decode_properties
from .mqtt_v5_properties import encode_properties as ep
from .mqtt_v5_properties import decode_properties as dp
encode_properties = ep
decode_properties = dp
EDIT: PR is open #148
In our meeting you asked me where I saw this going. I've now had a chance to give this some thought.
clean_session
and to explain expiry times and user properties.mosquitto_pub
and mosquitto_sub
to test the script.mqtt_as.py
as these now serve no purpose. Aside from some minor refactoring I don't plan any other code changes.I intend to experiment with returning properties from functions as discussed. While I take your point about e.g. PUBUCK
properties being little used, I do think it's worth at least trying that approach. It strikes me as a logical API. It might even save bytes in the crucial case of V3.1.1 use on ESP8266 because we would be removing debug print statements.
Even if I judge the test a success I won't merge the branch until you've had a chance to review it.
Comments on this plan are welcome.
These all seem like good points.
I'd be happy to take a look at any tests you intend to do.
We've been using this library for a while and have been really happy with the performance and usage. However, we recently took a look at how we could improve some of our cloud communication and discovered a need for MQTTv5 features like request-response, topic aliases and some of the expiry features.
From our perspective, it seems like we will need to develop these features anyway, but we would like to contribute these features back to the community. Given that this is (in our opinion) the go-to library for MicroPython MQTT support, It makes sense to work together to add support here.
Are you interested in adding support for MQTTv5? Our implementation could probably help from your experience with the current library, which would be beneficial for everyone using it.