xHasKx / luamqtt

luamqtt - Pure-lua MQTT v3.1.1 and v5.0 client
https://xhaskx.github.io/luamqtt/
MIT License
154 stars 41 forks source link

generic topic validation and parsing #36

Open Tieske opened 2 years ago

Tieske commented 2 years ago

see tests for parsing examples.

Tieske commented 2 years ago

not sure where this should go, it's now in init.lua. Should it go into client.lua maybe?

possibly every subscription could be parsed by the client, and the parsed results passed along in the event handler.

thoughts?

xHasKx commented 2 years ago

not sure where this should go, it's now in init.lua. Should it go into client.lua maybe?

init.lua is ok as you've added a whole-module functions.

possibly every subscription could be parsed by the client, and the parsed results passed along in the event handler.

thoughts?

Actually I'm not sure these validation functions is required at all. I mean, for subscribe and publish packets the topic should be just a string, some broker validates it, some not, and some can even provide custom wildcards/filters in that topics. So there is no need for the client-side validation because good broker will do this for us and return appropriate SUBACK/PUBACK code. And for custom brokers the client-side topic validation can do an obstacle for the library user.

IMO, only the last method to check topic and mask match can be frequently used in the client code - in the incoming message handler. There might be added a new client method like this:

client:add_msg_handler(topic_mask, callback) similar to the client:subscribe() method, adding not a SUBACK callback, but a PUBLISH handler called only for messages with matching topic.

Tieske commented 2 years ago

Actually I'm not sure these validation functions is required at all.

agreed they are not, and I would not implement them in the client itself. They are just convenience functions for user-code that wants to validate user-input.

Btw; the parser function needs to compile, and the compile function needs to validate, so 3 out of the 4 are needed. So I guess the last one (to validate non-wildcarded topics) is then just for completeness sake.

Tieske commented 2 years ago

client:add_msg_handler(topic_mask, callback) similar to the client:subscribe() method, adding not a SUBACK callback, but a PUBLISH handler called only for messages with matching topic.

yeah, I was thinking along the same lines. But in that case, who will be responsible to call the ACK on the received message?

xHasKx commented 2 years ago

client:add_msg_handler(topic_mask, callback) similar to the client:subscribe() method, adding not a SUBACK callback, but a PUBLISH handler called only for messages with matching topic.

yeah, I was thinking along the same lines. But in that case, who will be responsible to call the ACK on the received message?

This is not related. I think, like in usual message handler, ack should be done explicitly like this: https://github.com/xHasKx/luamqtt/blob/master/examples/simple.lua#L38

Tieske commented 2 years ago

exactly, but then you'd always need 2 handlers, a generic one that does the ACK, and another one matching a specific topic mask.

Still learning on the MQTT stuff; why doesn't the client send the ACK by default? why relying on user-code to do that?

xHasKx commented 2 years ago

exactly, but then you'd always need 2 handlers, a generic one that does the ACK, and another one matching a specific topic mask.

I think user can use subscribe handlers in 3 ways:

For now I think it will be better to leave user always responsible for sending ack. And if he will use last case with handlers set in both ways and send ack twice accidentally - this will not be a MQTT protocol violation (as I remember). Maybe some brokers will close connection in this case but I think it's uncommon.

Still learning on the MQTT stuff; why doesn't the client send the ACK by default? why relying on user-code to do that?

For example, MQTT can be used to distribute some tasks between workers, and if worker can't handle some task - it may not send ack to indicate failure (in the other words, worker will not send ack untill task is successfully done). Thus the task will stay in a queue and thus can be handled by some different worker. This scenario requires a persistent mqtt session (clean=false)

I suppose there might be a client option for __init() method enabling auto-ack for all messages, but I think it should be disabled by default.

xHasKx commented 2 years ago

For example, user can use topic-specific handler to execute it's logic and do ack, and at the same time also use generic client:on() message handler for logging without ack.

Tieske commented 2 years ago

For example, MQTT can be used to distribute some tasks between workers, and if worker can't handle some task - it may not send ack to indicate failure (in the other words, worker will not send ack untill task is successfully done). Thus the task will stay in a queue and thus can be handled by some different worker. This scenario requires a persistent mqtt session (clean=false)

That makes sense, thx for the explanation

Tieske commented 2 years ago

@xHasKx any chance we can move this forward? anything required from me?

I rebased the PR on master to it get up-to-date

xHasKx commented 2 years ago

@xHasKx any chance we can move this forward? anything required from me?

I rebased the PR on master to it get up-to-date

@Tieske, I posted some comments (change requests) on the commits in the PR, please take a look

xHasKx commented 2 years ago

And I'm still concerned about mqtt.topic_match function. It's arguments are complicated and it's doing two tasks at the same time:

  1. check that the topic matches the subscribe-topic-filter string
  2. parse part of the topic at wildcards places

IMO, we can break that into two functions, first one is just compile_topic_pattern+string.match call, and the second one can be just a helper to split a particular topic string with / char into an array (table)

What do you think?

xHasKx commented 2 years ago

IMO, we can break that into two functions, first one is just compile_topic_pattern+string.match call, and the second one can be just a helper to split a particular topic string with / char into an array (table)

Actually string.match will return the words at the wildcards of the topic, so your topic_match function is just adding keys to them. Does it actually required?

Tieske commented 2 years ago

Pushed a commit with the review comments (probably easiest to review that commit without whitespace changes).

Tieske commented 2 years ago

Actually string.match will return the words at the wildcards of the topic, so your topic_match function is just adding keys to them. Does it actually required?

No, it is not required. It is just convenience to get named arguments/parameters, so user code can be more readable. I have a number of 'id' type fields, which I hate to have in code as array indices, using names is so much more convenient for that. That's the whole purpose I guess (and caching the compiled pattern).

IMO, we can break that into two functions, first one is just compile_topic_pattern+string.match call, and the second one can be just a helper to split a particular topic string with / char into an array (table)

compile + string_match would not work, since there would be no structure where the compiled pattern can be stored (cached) for future use. So that would have to be implemented in user code, and then this function would be reduced to only a string.match call, so a function solely for that would not make sense to me.

That said, I do have doubts about the return values, or the entire interface. Currently I do not like the 2 return values, makes it cumbersome to handle. Also the function call with the topic and options, since the options table stores the cached pattern. Maybe the options table should become a class, from which a "topic" can be instantiated, that would hold all the named values. That would make the usercode more natural to retain the "class" instead of an options table.

something like;

local topic_matcher = mqtt.create_topic_matcher {
    topic = "homes/+/+/#",
    "homeid",
    "roomid",
    "varargs",
}

local fields, err = topic_matcher("homes/myhome/living/mainlights/brightness")

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields.varargs[1])   --> "mainlights"
print(fields.varargs[2])   --> "brightness"

This would enable adding a new incoming message callback per topic-pattern, where the msg parameter would already contain the parsed fields from the topic.

xHasKx commented 2 years ago

@Tieske, I wrote some comments and continue review tomorrow

xHasKx commented 2 years ago

@Tieske, what do you think about that form:

local topic_matcher = mqtt.topic_matcher { -- without "create_" it looks like a class constructor calling
    topic = "homes/+/+/#",
    "homeid",
    "roomid",
}

local fields, err = topic_matcher("homes/myhome/living/mainlights/brightness")

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields[3])           --> "mainlights"
print(fields[4])           --> "brightness"

It will simplify mqtt.topic_matcher constructor args validation so you don't have to check that amount of number-indexed keys are equals to the /+/ topic wildcards.

Another variant is to explicitly provide a key to store array of all unnamed topic words like that:

local topic_matcher = mqtt.topic_matcher { -- without "create_" it looks like a class constructor calling
    topic = "homes/+/+/#",
    "homeid",
    "roomid",
    rest = "varargs",
}

local fields, err = topic_matcher("homes/myhome/living/mainlights/brightness")

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields.varargs[1])   --> "mainlights"
print(fields.varargs[2])   --> "brightness"
Tieske commented 2 years ago

It will simplify mqtt.topic_matcher constructor args validation so you don't have to check that amount of number-indexed keys are equals to the /+/ topic wildcards.

if we adjust compile_topic_pattern to return as second and third return values the number of + and # found, that check could actually be very simple.

This option:

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields[3])           --> "mainlights"
print(fields[4])           --> "brightness"

has as a drawback that you can't immediately tell where the varargs start in the array part. Hence I would prefer your second option;

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields.varargs[1])   --> "mainlights"
print(fields.varargs[2])   --> "brightness"

But here the user needs an extra field name for the varargs, to prevent namecollisions between the field names and the varargs field.

Leading me to:

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "mainlights"
print(fields[2])           --> "brightness"

as the simplest version, clearly separating the known fields from the varargs, whilst making varargs size and traversal easy as the array part. And no extra user specified name required.

xHasKx commented 2 years ago

Leading me to:

I suggested rest = "varargs" because of that possible case:

local topic_matcher = mqtt.topic_matcher {
    topic = "homes/+/+/#",
    "homeid",
    -- "roomid", -- note only one name for +
    ...
}

Where the second + wildcard are without given name. In your case its value from the topic will be lost.

And we have to find a proper form for the case when user don't have names for +-es at all.

Leading me to:

I suggested rest = "varargs" because of that possible case:

local topic_matcher = mqtt.topic_matcher {
    topic = "homes/+/+/#",
    "homeid",
    -- "roomid", -- note only one name for +
    ...
}

Where the second + wildcard are without given name. In your case its value from the topic will be lost.

And we have to find a proper form for the case when user don't have names for +-es at all.

We can define some default name for the rest = "varargs" to store all +-wildcards in the root returned object and all #-wildcards into it's subfield with that default name.

Or accept that it's ok to return two values from the matcher function - first table with +-es with all values as indexes and some of them with names, and the second table with #-s as an array. What do you think? We can define some default name for the rest = "varargs" to store all +-wildcards in the root returned object and all #-wildcards into it's subfield with that default name.

Or accept that it's ok to return two values from the matcher function - first table with +-es with all values as indexes and some of them with names, and the second table with #-s as an array. What do you think?

Tieske commented 2 years ago

Where the second + wildcard are without given name. In your case its value from the topic will be lost.

is that expected to be a common case? I’d expect "all fields named" to be the common case. We can just throw an error if there aren’t enough field names?

Or we could inject numbered names field 1 etc

xHasKx commented 2 years ago

I would say, we can't predict which case will be more common. The only thing I'm sure is that the best interface is the one that hard to use in the wrong way unintentionally.

We can do one step back and ask how that topic_matcher can be used? I mean, users start their code from the client instance - creating it, then connecting and subscribing it to some topics. Then somewhere has to be a callback function with arrived PUBLISH message in its argument. And that message's topic can be parsed using the topic_matcher instance. But who should call the topic_matcher instance - the library before calling callback or the user itself inside the callback code?

I mean, user can call some new method client:add_msg_handler(topic_filter, matcher_options, callback) where matcher_options is an args for the topic_matcher instance creation. Then for the each new PUBLISH messages callback will be called in that way:

callback(publish_message, topic_fields, topic_varargs_fields)

In this case for the the topic_matcher instance will be better to return 2 values - +-wildcards with optional names and #-wildcards as two separate tables (by the way, the second table also can contain user-defined names).

In the case of two tables in resulf of the topic_matcher, it's better to always set array-keys into the table and optionally add user-defined names.

(and of course users can create and use the topic_matcher instances separately, in the only one publish messages callback)

What do you think?