redboltz / mqtt_cpp

Boost Software License 1.0
437 stars 107 forks source link

Broker authorization proposal #779

Closed kleunen closed 2 years ago

kleunen commented 3 years ago

I had an idea how to add topic authorization to the broker. I would like to propose this idea. To start with, you need a database of accounts with possible topic filters as follows:

USER1: Password1 topic: example/+/test, rights: publish

USER2: Password2 topic: example/+/test, rights: subscribe

USER3: Password3 topic: example/+/test, rights: publish + subscribe

So accounts get a list of users with passwords, and topic filters with rights if they are allowed to publish/subscribe to a topic.

Now, in the broker, when a connect enters: https://github.com/redboltz/mqtt_cpp/blob/master/include/mqtt/broker/broker.hpp#L405-L450

Rather than calling 'connect_handler' directly: https://github.com/redboltz/mqtt_cpp/blob/master/include/mqtt/broker/broker.hpp#L439-L448

You pass the connect request to some authorization class:

class AuthorizerInterface {
  virtual void authorize_connect(
        con_sp_t spep,
        buffer client_id,
        optional<buffer> /*username*/,
        optional<buffer> /*password*/,
        optional<will> will,
        bool clean_start,
        std::uint16_t /*keep_alive*/,
        v5::properties props
    )
}

This will lookup the username/password in the databae (possibly a json file or some external authenticator). And finally forward the request to the connect handler within the broker with the rights:

  bool connect_handler(
        con_sp_t spep,
        buffer client_id,
        optional<buffer> /*username*/,
        optional<buffer> /*password*/,
        optional<will> will,
        bool clean_start,
        std::uint16_t /*keep_alive*/,
        v5::properties props, 
        std::vector< MQTT_NS::buffer > user_publish_filters,
        std::vector< MQTT_NS::buffer > user_subscribe_filters,
    )

The user rights are stored in a subscription map, such that we know for each session which rights apply (first set is list of sessions that are allowed to publish, second is list of sessions which are allowed to subscribe): using sub_rights_map = multiple_subscription_map<buffer, std::pair< std::set, std::set > >;

Now, when a message is published in

bool publish_handler(
        con_sp_t spep,
        optional<packet_id_t> packet_id,
        publish_options pubopts,
        buffer topic_name,
        buffer contents,
        v5::properties props) {

Lookup the topic_name in sub_rights_map. The publisher should have rights: Publish The sessions that receive the messages should have rights: Subscribe

The publisher should be somewhere in any of the filters which is allowed to publish to the topic.

You can lookup the set of subscribers by looking up the complete set of sessions which are allowed to subscribe to this topic std::set < session_state_ref> >, and then calculating the intersection with sessions which are actually subscribed to this topic.

kleunen commented 3 years ago

Yes i think that is quite clear. The acl rules are clear

redboltz commented 3 years ago

I have other question about # in the authorization rule. Now, it has different meaning from MQTT wildcard. I guess that wildcard + is not allowed in the authorization rule. I think that it is not so meaningful. Is that right ?

If it is, # could be misleading. I think that * is better. It is just a syntax issue. What do you think ?

kleunen commented 3 years ago

No it is possible to check subscription with wildcard against auth rule with wildcard. The subscription has to be less specific or equal specific than the the auth rule. I will explain later, because this needs some examples.

kleunen commented 3 years ago

You can check given an auth rule + subscription, if the subscription matches the authentication rules. For each token in the auth + subscription, you can check:

auth subscription authenticated
literal literal yes
literal + or # no
+ literal or + yes
+ # no
# literal or + or # yes

so for example: auth: example/a, sub: example/a, auth: yes auth: example/a, sub: example/b, auth: no

auth: example/+/a, sub: example/a/a, auth: yes auth: example/+/a, sub: example/+/a, auth: yes auth: example/+/a, sub: example/#, auth: no

auth: example/#, sub: example/a, auth: yes auth: example/#, sub: example/+, auth: yes auth: example/#, sub: example/#, auth: yes

redboltz commented 3 years ago

Thank you for explaining. I understand. It seems to be good rule. Let's continue the discussion based on this rule.

I wrote the authorization model https://github.com/redboltz/mqtt_cpp/issues/779#issuecomment-846553347 I noticed that it is Username based. Authorization is a relationship between topics (including Topic Filter) and Username.

I think that users want to make a rule that "any users can publish to the topic a/b", but Username frequent_publisher should be denied. What is a good way to explain it ?

kleunen commented 3 years ago

default allow all
username frequent_publisher 
  deny write a/b
redboltz commented 3 years ago

I think that the following situation seems to be practical and difficult to describe rule.

topic user subscribe
companyA.com/release/# non_list_users deny
companyA.com/release/# u1 allow
companyA.com/release/# u2 allow
companyA.com/release/# u3 allow
companyA.com/trial/# non_list_users allow
companyA.com/trial/# u7 deny
companyA.com/trial/# u8 deny

CompanyA provide trial topic and relapse topic. Trial topic is for trial. Relapse topic for production. Trial topic is widely allowed for easy trial but some bad behavior users (e.g. too much publish) are denied (negative listed). Production topic requires individual permission because of security.

I think that it should be able to be expressed by our rule but I think that it is not possible. ( There are many users.)

Another similar case. CompanyA has different default policy from CompanyB.

topic user subscribe
companyA.com/# non_list_users deny
companyA.com/# u1 allow
companyA.com/# u2 allow
companyA.com/# u3 allow
companyB.com/# non_list_users allow
companyB.com/# u7 deny
companyB.com/# u8 deny
kleunen commented 3 years ago

Maybe have groups of ours and configure a default for a group of users ?

CompanyA a different group of users and settings as CompanyB

redboltz commented 3 years ago

I got some ideas.

  1. Topic based authorization is more natural concept than Username based authentication. Because MQTT topic implies tree structure but Username doesn't. So Topic is good for explain organization structure.
  2. As I mentioned companyA and companyB example, organization could have different default policy. And organization is usually structured. So companyA/division1 and companyA/division2 could have different policy. I think that recursive rule should be supported.
  3. Authentication and Authorization are separated issue basically. One relation is Authorization is for only authenticated users.

Rule structuring idea:

Result example: topic\user u1 u2 u3 u4
companyA/division1/t1 d d a a
companyA/division1/t2 a d d a
companyA/division1/t3 a a a a
companyA/division2/t1 a a d d
companyA/division2/t2 d a a d
companyA/division2/t3 d d d d
companyA/division3/t1 d d d d
companyB/division1/t1 a d d a
companyB/division1/t2 a a a a
companyB/division2/t1 a a a a
companyC/division1/t1 a a a a

a means allow, d means deny.

Note: u4, t3, division3, and companyC are not in the rule tree. They are written to check default rule.

I use the topic ends with # as default rule. It can't have users. I think that it is simple and flexible.

But users might be expanded as follows if needed:

users: * means for all users. If this is omitted, the rule assume users: *.

The rule parsed top to bottom, if conflicted rule entry is appeared, then new one overwrites the old one. Maybe outputting warning message is helpful.

Rule writers can write the rules like a branching tree.

kleunen commented 3 years ago

It is possible but it sounds a bit complicated to configure and also to use. But I guess in practice the configuration will not be very complicated, in practive only a few rules will be configured.

With the subscription map you should be able to find all rules which apply when you publish a specific topic:

When you have rules: companyA/division2/# companyA/division2/t1

and I publish companyA/division2/t1, the subscription map will match 'companyA/division2/t1' and 'companyA/division2/#'. And you need to know which has priority over which. I would say the more specific rule has priority over the less specific rule.

redboltz commented 3 years ago

When you have rules: companyA/division2/# companyA/division2/t1

and I publish companyA/division2/t1, the subscription map will match 'companyA/division2/t1' and 'companyA/division2/#'. And you need to know which has priority over which. I would say the more specific rule has priority over the less specific rule.

Yes, more specific one should has higher priority.

I forgot to add read/writ to my list. I mean the rule is for read.

Just I updated it. The combination of topic and user are not changed.

companyA/division2/# companyA/division2/t1

Back to your case, u1 can't subscribe companyA/division2/# but can subscribe companyA/division2/t1. So only companyA/division2/t1 is added to the subscription map. When u9 publishes companyA/division2/t1, then companyA/division2/t1 is matched and the message is delivered to u1.

redboltz commented 3 years ago

Another case, u1 can subscribe companyA/division1/# but can't subscribe companyA/division1/t1. Let's say, u1 has subscribed companyA/division1/# now. What happens when u9 publishes companyA/division1/t1?

We can define two meanings.

  1. read rule is only for subscription. It is not related to deliver. In other words, authorization and wildcard matching are independent concept. In this case, published message is delivered to u1. No delivery time checking required.
  2. read rule is not only for subscription but also for delivery. In this case, published message is NOT delivered to u1.

I think that 2 might difficult to implement or might need high cost checking logic (I'm not sure). In this case 1 is a little bit surprising behavior but acceptable (We need to document authorization and wildcard matching are independent concept)

If 2 can be implemented by practical cost, 2 is better.

redboltz commented 3 years ago

It is possible but it sounds a bit complicated to configure and also to use. But I guess in practice the configuration will not be very complicated, in practive only a few rules will be configured.

I forgot to answer the comment above. Ordinary users use a small part of this rule. Like as follows:

I think that it is simple enough.

But it is worth to have the rule capability that can be explain complicated case. I'd like to find the rule that has both easy to write for ordinary users and high capability. I believe that my idea achieves both.

redboltz commented 3 years ago

Another case, u1 can subscribe companyA/division1/# but can't subscribe companyA/division1/t1. Let's say, u1 has subscribed companyA/division1/# now. What happens when u9 publishes companyA/division1/t1?

We can define two meanings.

  1. read rule is only for subscription. It is not related to deliver. In other words, authorization and wildcard matching are independent concept. In this case, published message is delivered to u1. No delivery time checking required.
  2. read rule is not only for subscription but also for delivery. In this case, published message is NOT delivered to u1.

I think that 2 might difficult to implement or might need high cost checking logic (I'm not sure). In this case 1 is a little bit surprising behavior but acceptable (We need to document authorization and wildcard matching are independent concept)

If 2 can be implemented by practical cost, 2 is better.

I noticed that 1 is bad. Only 2 is acceptable. If u1 subscribes # then all messages are delivered to u1. It is bad. Sorry for the mix up.

kleunen commented 3 years ago

Authentication

?

redboltz commented 3 years ago

read write usage of typical IoT application

Hmm. In my IoT service developing experience, a user requires both read and write permission to one topic is rare.

Let's say there are sensor, actuator, and controller. The sensor reports some status. The controller send a request to the actuator based on the reported sensor's status. It is one of typical scenario.

Usernames are sensor1, actuator1, and controller1.

There are topics as follows.

iot_app/sensors/sensor1_status
iot_app/sensors/actuator1_request

sensor1 can publish iot_app/sensors/sensor1_status. controller1 can subscribe iot_app/sensors/sensor1_status. controller1 can publish iot_app/sensors/actuator1_request. actuator1 can subscribe iot_app/sensors/actuator1_request.

applying my notation (read/write separated)

The minimal authorization rule is as follows:

I wrote Authentication in the comment above. It should be Authorization. I sometimes got confused.


Note

 - `#` deny

is the short form of

 - `#` deny
     - users: *

There is no read/write allowed user in the one topic.

applying refined notation based on your comment (read/write mixed)

However, https://github.com/redboltz/mqtt_cpp/issues/779#issuecomment-850870655 's advantage is compact notation.


Note

 - `#` deny

is the short form of

 - `#` deny
     - read *
     - write *

If read and/or write is omitted, then regard it as * (all users). For example,

What do you think of the last version (refined notation based on your comment (read/write mixed)) ?

kleunen commented 3 years ago

Yes, this looks good.

But I would make the default: If read and/or write is omitted, then regard it as nobody (no users). So make it explicit you want to allow everybody, from security perspective.

And it would be useful to combine users into groups

so you can say: group controller_group controller1 controller1 allow controller_group

redboltz commented 3 years ago

Yes, this looks good.

But I would make the default: If read and/or write is omitted, then regard it as nobody (no users). So make it explicit you want to allow everybody, from security perspective.

Ok, I agree. Let me clarify. Let's say * is all users and - is no users.

Allow omitting

Case 1

is the same as

Because deny with omitting read/write should be for all users.

Case 2

is the same as

Because allow with omitting read/write should be for no users.

Observation. Default meaning is inconsistent.

Disallow omitting

read and write are mandatory. If omitted, then format error is reported.


I think that disallow omitting approach is simpler. What do you think?

kleunen commented 3 years ago

Maybe not simpler, because user always has to add read and write. But at least it is clear who is allowed. Maybe give in error message if read and write is missing explanation of how to allow all, allow nobody or allow username.

redboltz commented 3 years ago

Maybe not simpler, because user always has to add read and write. But at least it is clear who is allowed. Maybe give in error message if read and write is missing explanation of how to allow all, allow nobody or allow username.

Indeed.

What do you think about Allow omitting rule? deny with omitting means for all users. allow with omitting means for no users. Do you agree this rule ?

redboltz commented 3 years ago

And it would be useful to combine users into groups

so you can say: group controller_group controller1 controller1 allow controller_group

I agree to introducing user group concept.

One user can be a member of multiple groups. I considered the following cases.

And the permission should be as follows:

u1 can read/write topic1. u2 can read/write topic1. u3 can't read/write topic1.

u1 can't read/write trial/topic2. u2 can't read/write trial/topic2. u3 can read/write trial/topic2.

u1 can read messy/topic3. u2 can't read messy/topic3. u3 can't read messy/topic3. u1 can't write messy/topic3. u2 can't write messy/topic3. u3 can write messy/topic3.

Do you think so?

kleunen commented 3 years ago

Yes i agree with allow ommiting rule. If default is nobody for allow and everybody for deny. This is ok too. It is also a good approach.

You can block an offending topic easily. Just say deny and nobody can read and write. Can be convenient.

kleunen commented 3 years ago

Maybe have ommiting for deny: everbody is denied.

But give error if ommited on allow: please specify who is allowed. All or specific user or group of users.

kleunen commented 3 years ago

And it would be useful to combine users into groups so you can say: group controller_group controller1 controller1 allow controller_group

I agree to introducing user group concept.

One user can be a member of multiple groups. I considered the following cases.

  • user_group

    • group1

    • users: u1, u2

    • group2

    • users: u2, u3

  • Authorization

    • # deny

    • read *

    • write *

    • topic1 allow

    • read group1

    • write group1

    • trial/# allow

    • read *

    • write *

    • trial/topic2 deny

    • read group1

    • write group1

    • messy/# allow

    • read group1

    • write group2

    • messy/topic3 deny

    • read group2

    • write group1

And the permission should be as follows:

u1 can read/write topic1. u2 can read/write topic1. u3 can't read/write topic1.

u1 can't read/write trial/topic2. u2 can't read/write trial/topic2. u3 can read/write trial/topic2.

u1 can read messy/topic3. u2 can't read messy/topic3. u3 can't read messy/topic3. u1 can't write messy/topic3. u2 can't write messy/topic3. u3 can write messy/topic3.

Do you think so?

Yes. But maybe group name always prefixed with @?

@group1 @group2 u1 u2

kleunen commented 3 years ago

But if group is allowed and user is denied? User rule gets priority over group rule?

redboltz commented 3 years ago

Yes. But maybe group name always prefixed with @? @Group1 @Group2 u1 u2

Prefix is a good idea. MQTT spec allows any UTF-8 string for Username. https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901071

So Username might starts with @. Maybe a requirement mqtt_cpp broker client must not use a Username starts with@`.

redboltz commented 3 years ago

But if group is allowed and user is denied? User rule gets priority over group rule?

I think that it is a similar situation, u2 is member of group1 and group2. And group1 allowed and group2 denied. But it couldn't happen at the same topic.

In this case u1 can subscribe sub1/#, sub1/any_topics_except_topic1, but can't subscribe sub1/topic1. When u1 has subscribed sub1/#, then the publish sub1/topic1 happen, the message is NOT delivered to u1 as we discussed. There is no additional rule and confliction that is introduced by group concept.

Another case:

I think that it should be format error. The same topic entry should appear once. If it appears twice or more, it should be error.

redboltz commented 3 years ago
  • # deny

    • topic1 deny

    • read @Group1

    • topic1 allow

    • read @Group2

And user u1 is part of Group1 and Group2 ?

I think user should only be part of 1 group.

This is format error.

I assume that the parsing process is top to bottom.

I added comments.

kleunen commented 3 years ago

What about:

more specific topic should follow broader topics ? so should be ?

redboltz commented 3 years ago

What about:

  • # deny

    • /sub/topic1 deny

    • read @Group1

    • /sub/# allow

    • read @Group2

?

It is the same as follows:


Two possible options.

  1. Simply top to bottom parse. So rule2 overwrite rule1. (Not good)
  2. Sort by wide to narrow. And then apply rules. (good)

After sorted:

username subscribe sub/# subscribe sub/topic1 deliver (sub/topic1)
u1 no no no
u2 yes no no
u3 no yes yes
redboltz commented 3 years ago

more specific topic should follow broader topics ? so should be ?

"more specific" is the same meaning as "sort wide to narrow" I commented.

I recommend writing the rule this order. But I think that it can be sorted by the broker. If it can, sort is kinder implementation. And possibly, output warning message if users rule is not sorted. If it is difficult to implement, output error message and finish broker due to invalid rule format. It is acceptable option.

kleunen commented 3 years ago

Yes a warning should be generated if rules are applied in different order

redboltz commented 3 years ago

Maybe I edited the comment https://github.com/redboltz/mqtt_cpp/issues/779#issuecomment-850982726 after you read. Please check it again :)

kleunen commented 3 years ago

I think it is ok like this.

redboltz commented 3 years ago

Thank you! I think that the semantics are fixed.

The next step is syntax. I wrote JSON example:

{
  "authentication": [
    {
      "name": "u1",
      "method": "password",
      "password": "mypassword"
    },
    {
      "name": "u2",
      "method": "client_cert"
    }
  ],
  "group": [
    {
      "@g1" : ["u1", "u2"]
    }
  ],
  "authorization": [
    {
      "topic": "#",
      "type": "deny"
#     "pub": ["*"]     # can omit
#     "sub": ["*"]     # can omit
    },
    {
      "topic": "sub/#",
      "type": "allow",
      "sub": ["@g1"]
    },
    {
      "topic": "sub/topic1",
      "type": "deny",
      "sub": ["u1"]
    },
  ]
}

I choose the word "sub/pub" instead of "read/write" because they are MQTT words.

What do you think?

kleunen commented 3 years ago

I think text based rules with tabs is better.

But you can support both


topic /topic1 allow
    read: u1
redboltz commented 3 years ago

I think text based rules with tabs is better.

JSON is a little bit redundant. So simpler notation is nice. By the way, "tabs" means indent ? I personally don't like TAB character. Indent is good,

If we support JSON and ini file format, you can use boost property tree. If you want to support other (original) text format, then you need to use Boost.Spirit (or X3). X3 is more sophisticated but experimental (it actually works).

redboltz commented 3 years ago

Or do you know any good library to parse indented text ?

kleunen commented 3 years ago

Yes indent is spaces or tabs.

I do not know any parser. Maybe there if a yaml parser based on spirit? https://en.m.wikipedia.org/wiki/YAML

redboltz commented 3 years ago

Ok, I think that Boost.Spirit.X3 is good one to write parser.

Document

https://www.boost.org/doc/libs/1_76_0/libs/spirit/doc/x3/html/index.html

MessagePack format parser (example)

https://github.com/msgpack/msgpack/blob/master/spec.md https://github.com/msgpack/msgpack-c/blob/cpp_master/include/msgpack/v2/x3_parse.hpp

I think that writing PoC code to parse indented text. It outputs C++ data structure.

kleunen commented 3 years ago

You can also parse indented text to property_tree, that way you only have to handle property tree when reading configuration. And also be able to input ini and json format.

redboltz commented 3 years ago

Which do you mean Pattern A or Pattern B ?

If you mean B, I think that C is better. If you mean A, we need to check property_tree has enough accessing method. mqtt_cpp_some_data_structure might be multi_index. It can provide flexible access. I guess that the data structure needs to have flexible accessing methods if we implement on runtime update in the future.

Pattern A

ini -----------------------+
                           |
                           V
json-----------------> property_tree ---> broker
                           A
                           |
indented text -------> spirit x3

Pattern B

ini -----------------------+
                           |
                           V
json-----------------> property_tree ---> mqtt_cpp_some_data_structure ---> broker
                           A
                           |
indented text -------> spirit x3

Pattern C

ini -----------------------+
                           |
                           V
json-----------------> property_tree ---> mqtt_cpp_some_data_structure ---> broker
                                                    A
                                                    |
indented text -------> spirit x3 -------------------+
kleunen commented 3 years ago

Pattern B.

mqtt_cpp_some_data_structure will be a subscription_map probably, and some combination of datastructures, possibly. You do want to optimize the checking of rules when user logs in.

std::map<username, userinfo> std::map<groupname, std::set > subscription_map<rules, acces rights>

something like this.

redboltz commented 3 years ago

I think that spirit x3's semantic action adds parse result to property_tree (Pattern B) or mqtt_cpp_some_data_structure (Pattern C) repeatedly. I think that it Pattern C is simpler and straight forward approach. I'm not sure but I guess that property_tree is designed for parser and element accessor. In pattern B, property_tree is used as container. Maybe insert some element to the property_tree in the semantic action. It is a little weird for me.

kleunen commented 3 years ago

Just have a look what is easiest. Maybe first define the internal datastructures for fast authentication and rule matching.

redboltz commented 3 years ago

Ok. By the way, I guess that sorting by wide to narrow will be implemented in mqtt_cpp_some_data_structure. At least property_tree doesn't have such functionality.

Maybe first define the internal datastructures for fast authentication and rule matching.

Yes, I think that it is a good way.

kleunen commented 3 years ago

Have you made any progress on the authorization ? or not working on mqtt_cpp ?

redboltz commented 3 years ago

I'm working on my company's broker and SDKs. So unfortunately, I don't have much time for mqtt_cpp. I think that the spec of authentication and authorization is almost fixed. At least we have the agreement for the controversy part. So I think that you can start implementing them. PR is welcome :)

kleunen commented 3 years ago

I was just a bit worried, you haven't committed anything since may. I was thinking: i hope nothing bad happened to you. But luckily you are just busy.

Yes, maybe after my holiday, I may pickup some work again on mqtt_cpp. I have been runner the broker for quite a while now, it is completely stable. Although only used sometimes.

redboltz commented 3 years ago

Sorry for making you worry.

This is one of my recent activity on github. The logic is from mqtt_cpp. (The PR itself created some time ago.) https://github.com/mqttjs/MQTT.js/pull/1243#issuecomment-865393719 This is a related work of my company's (extended) MQTT SDKs.