redboltz / mqtt_cpp

Boost Software License 1.0
437 stars 107 forks source link

Broker authorization proposal #779

Closed kleunen closed 2 years ago

kleunen commented 3 years ago

I had an idea how to add topic authorization to the broker. I would like to propose this idea. To start with, you need a database of accounts with possible topic filters as follows:

USER1: Password1 topic: example/+/test, rights: publish

USER2: Password2 topic: example/+/test, rights: subscribe

USER3: Password3 topic: example/+/test, rights: publish + subscribe

So accounts get a list of users with passwords, and topic filters with rights if they are allowed to publish/subscribe to a topic.

Now, in the broker, when a connect enters: https://github.com/redboltz/mqtt_cpp/blob/master/include/mqtt/broker/broker.hpp#L405-L450

Rather than calling 'connect_handler' directly: https://github.com/redboltz/mqtt_cpp/blob/master/include/mqtt/broker/broker.hpp#L439-L448

You pass the connect request to some authorization class:

class AuthorizerInterface {
  virtual void authorize_connect(
        con_sp_t spep,
        buffer client_id,
        optional<buffer> /*username*/,
        optional<buffer> /*password*/,
        optional<will> will,
        bool clean_start,
        std::uint16_t /*keep_alive*/,
        v5::properties props
    )
}

This will lookup the username/password in the databae (possibly a json file or some external authenticator). And finally forward the request to the connect handler within the broker with the rights:

  bool connect_handler(
        con_sp_t spep,
        buffer client_id,
        optional<buffer> /*username*/,
        optional<buffer> /*password*/,
        optional<will> will,
        bool clean_start,
        std::uint16_t /*keep_alive*/,
        v5::properties props, 
        std::vector< MQTT_NS::buffer > user_publish_filters,
        std::vector< MQTT_NS::buffer > user_subscribe_filters,
    )

The user rights are stored in a subscription map, such that we know for each session which rights apply (first set is list of sessions that are allowed to publish, second is list of sessions which are allowed to subscribe): using sub_rights_map = multiple_subscription_map<buffer, std::pair< std::set, std::set > >;

Now, when a message is published in

bool publish_handler(
        con_sp_t spep,
        optional<packet_id_t> packet_id,
        publish_options pubopts,
        buffer topic_name,
        buffer contents,
        v5::properties props) {

Lookup the topic_name in sub_rights_map. The publisher should have rights: Publish The sessions that receive the messages should have rights: Subscribe

The publisher should be somewhere in any of the filters which is allowed to publish to the topic.

You can lookup the set of subscribers by looking up the complete set of sessions which are allowed to subscribe to this topic std::set < session_state_ref> >, and then calculating the intersection with sessions which are actually subscribed to this topic.

kleunen commented 3 years ago

An alternative would be that on a subscribe you calculate which filters the session is actually allowed to subscribe. So you combined the authorization filter + requested topic filter into an authorized topic filter:

Auth:      level1/level2/level3
Subscribe: level1/level2/level3
Result:    level1/level2/level3

Auth:      level1/+/level3
Subscribe: level1/level2/level3
Result:    level1/level2/level3

Auth:      level1/level2/level3
Subscribe: level1/+/level3
Result:    level1/level2/level3

Auth:      level1/+/level3
Subscribe: level1/+/level3
Result:    level1/+/level3

Auth:      level1/level2/#
Subscribe: level1/level2/#
Result:    level1/level2/#

Auth:      level1/level2/#
Subscribe: level1/level2/level3/+
Result:    level1/level2/level3/+

Auth:      level1/level2/level3/+
Subscribe: level1/level2/#
Result:    level1/level2/level3/+

Auth:      level1/level2/#
Subscribe: level1/level2/level3/level4
Result:    level1/level2/level3/level4

Auth:      level1/level2/+
Subscribe: level1/level2/level3/level4
Result:    Not allowed

Auth:      level1/level2/level3/level4
Subscribe: level1/level2/+
Result:    level1/level2/level3/level4

Auth:      level1/+/level3/+
Subscribe: level1/level2/+/level4
Result:    level1/level2/level3/level4

Auth:      level1/level2/+
Subscribe: nomatch/level2/+
Result:    Not allowed

Auth:      level1/+/level3/+
Subscribe: level1/+/+/level4
Result:    level1/+/level3/level4

I think the rules for this should be:`

<literal> + <literal> = <literal> (if literals are equal, otherwise not allowed)
<literal> + '#' = <literal> (remaining path also part of subscription)
'#' + <literal> = <literal> (and remaining path is allowed)
'#' + '#' = '#'
<literal> + '+' = <literal>
'+' + <literal>= <literal>
'+' + '+' = '+'
'#' + '+' = '+' (and remaining path is allowed)
'+' + '#' = '+' (and remaining path is allowed)
number of levels should be equal _or_ # should be used

Like so: https://wandbox.org/permlink/xH31CMPvYS1xpuBA

#include <algorithm>
#include <iostream>

static constexpr char topic_filter_separator = '/';

template<typename Iterator>
Iterator topic_filter_getnexttoken(Iterator first, Iterator last) 
{
    return std::find(first, last, topic_filter_separator);    
}

template<typename Iterator, typename Output>
inline void topic_filter_tokenizer(Iterator first, Iterator last, Output write) {
    auto pos = topic_filter_getnexttoken(first, last);
    while (write(first, pos) && pos != last) {
        first = std::next(pos);
        pos = topic_filter_getnexttoken(first, last);
    }
}

bool is_hash(std::string const &level) { return level == "#"; }
bool is_plus(std::string const &level) { return level == "+"; }
bool is_literal(std::string const &level) { return !is_hash(level) && !is_plus(level); }

std::optional<std::string> authorize_subscribe(std::string const &authorized_filter, std::string const &subscription_filter) 
{
    std::optional<std::string> result;        
    auto append_result = [&result](std::string const &token) { 
        if (result) { 
            result.value() += topic_filter_separator + token;
        } else {
            result = std::optional<std::string>(token);
        }        
    };

    auto filter_begin = authorized_filter.begin();   
    auto subscription_begin = subscription_filter.begin();   

    while (filter_begin < authorized_filter.end() && subscription_begin < subscription_filter.end()) {    
        auto filter_end = topic_filter_getnexttoken(filter_begin, authorized_filter.end());
        std::string auth = std::string(filter_begin, filter_end);
        filter_begin = std::next(filter_end);        

        auto subscription_end = topic_filter_getnexttoken(subscription_begin, subscription_filter.end());
        if(subscription_begin == subscription_filter.end()) {
            return std::optional<std::string>();            
        }

        std::string sub = std::string(subscription_begin, subscription_end);
        subscription_begin = std::next(subscription_end);

        if (is_hash(auth)) { 
            append_result(sub);                

            while (subscription_begin < subscription_filter.end()) {
                auto subscription_end = topic_filter_getnexttoken(subscription_begin, subscription_filter.end());            
                append_result(std::string(subscription_begin, subscription_end));
                subscription_begin = std::next(subscription_end);
            } 

            return result;
        } 

        if (is_hash(sub)) {
            append_result(auth);                

            while(filter_begin < authorized_filter.end()) {
                auto filter_end = topic_filter_getnexttoken(filter_begin, authorized_filter.end());
                append_result(std::string(filter_begin, filter_end));
                filter_begin = std::next(filter_end);
            } 

            return result;        
        }

        if (is_plus(auth)) {
            append_result(sub);                        
        }  else if (is_plus(sub)) { 
            append_result(auth);                        
        } else 
        {
            if (auth != sub)  {
                return std::optional<std::string>();                
            }

            append_result(auth);                
        }                
    }                         

    if ( filter_begin < authorized_filter.end() || subscription_begin < subscription_filter.end()) {
        return std::optional<std::string>();            
    }

    return result;
}

std::vector<std::string> authorize_subscribe(std::vector<std::string> const &authorized_filters, std::string const &subscription_filter) 
{
    std::vector<std::string> result;

    for(auto const& i: authorized_filters) {
        auto topic_filter = authorize_subscribe(i, subscription_filter);
        if (topic_filter) {
            result.push_back(topic_filter.value());
        } else 
        {
            std::cout << "Not authorized" << std::endl;
        }
    }

    return result;
}

int main()
{
    std::vector<std::string> authorized_filters = {
        "example/#"
    };

    auto result = authorize_subscribe(authorized_filters, "example/value/a");
    for(auto const& i: result) {
        std::cout << i << std::endl;
    }

    authorized_filters = {
        "example/value/a"
    };

    result = authorize_subscribe(authorized_filters, "example/#");
    for(auto const& i: result) {
        std::cout << i << std::endl;
    }

    authorized_filters = {
        "example/value/+"
    };

    result = authorize_subscribe(authorized_filters, "example/value/test");
    for(auto const& i: result) {
        std::cout << i << std::endl;
    }

    result = authorize_subscribe(authorized_filters, "example/value");
    for(auto const& i: result) {
        std::cout << i << std::endl;
    }

   authorized_filters = {
        "example/value/test"
    };

    result = authorize_subscribe(authorized_filters, "example/value/+");
    for(auto const& i: result) {
        std::cout << i << std::endl;
    }

    result = authorize_subscribe(authorized_filters, "example/value");
    for(auto const& i: result) {
        std::cout << i << std::endl;
    }   
}
kleunen commented 3 years ago

But problem with this approach is, you may get multiple matching filters:

    authorized_filters = {
        "example/+/test/+",
        "example/value/+/value"
    };

    result = authorize_subscribe(authorized_filters, "example/value/+/value");
    for(auto const& i: result) {
        std::cout << i << std::endl;
    }

Results in the following topics to be authorized and to be subscribed:

example/value/test/value
example/value/+/value

In case a publish happens to 'example/value/test/value', this will result in two subscriptions matching. So you have to filter out this duplicate and also make sure that when an unsubscribe is received, these two subscriptions are removed. Advantage of this approach is that no 'unauthorized' filters are stored in the subscription_map.

redboltz commented 3 years ago

mosquitto has its ACL. http://www.steves-internet-guide.com/topic-restriction-mosquitto-configuration/

If it is good enough, we should use it because it already exists and used. User can use existing ACL for mosquitto without conversion. It is a big advantege.

I'm not sure mosquitto ACL is good enough. What do you think?

kleunen commented 3 years ago

mosquitto has its ACL. http://www.steves-internet-guide.com/topic-restriction-mosquitto-configuration/

If it is good enough, we should use it because it already exists and used. User can use existing ACL for mosquitto without conversion. It is a big advantege.

I'm not sure mosquitto ACL is good enough. What do you think?

I did not see your reply.

Well, the biggest choice to make is:

or:

In this case, no unauthorized subscriptions will be in the subscription map (which is nice, i think) But, a single subscribe, may result in multiple entries in the subscription map (which you need to filter out when a message matches multiple subscriptions).

kleunen commented 3 years ago

Consider the following example:

User U1 connected to the server and subscribes to topic: example/#

The ACL for user U1 specified: example/a (read) example/b (read)

Option 1: When you do ACL check on publish, the subscribe U1:example/# is added to the subscription map.

And when a message is published to: example/c

User U1 is subscribed, the ACL for U1 is checked, and U1 will not receive the message

Option 2: When you do ACL check on subscribe, the subscribe U1:example/#

Will be combined with the ACL of user U1 (example/a, example/b) and the following entries are added to the subscription map: U1:example/a U1:example/b

In case example/c is published, user U1 is not matched/subscribed to this topic. No ACL check needs to be done on publish

kleunen commented 3 years ago

I'm not sure mosquitto ACL is good enough. What do you think?

What I don't like about mosquittos file format is that you need to write a parser for this format.

If you use json or ini, you can use boost property tree to parse: Boost Property tree

For example in mosquitto you can configure:

topic read $SYS/#

user roger
topic foo/bar

pattern write $SYS/broker/connection/%c/state

In json this would be:

{
    "acl": {
        "general": [{
            "topic": "$SYS/#",
            "rights": "read"
        }],
        "client": [{
            "pattern": "$SYS/broker/connection/%c/state",
            "rights": "write"
        }],
        "user": [{
            "roger": {
                "password": "1234",
                "acl": [{
                    "topic": "foo/bar",
                    "rights": "readwrite"
                }]
            }
        }]

    }
}
kleunen commented 3 years ago

Although parsing mosquitto acl fotmat is not that difficult

redboltz commented 3 years ago

Sorry, I'm busy on other projects. I will back to mqtt_cpp but I need time.

Just leave some comments for now.

I used to use Boost Property Tree but I don't think it is a good choice. The design is pretty old-fashioned and not well maintained recently. If we use json library there are some of better one.

If we need to write parser, Boost.Spirit X3 is good choice for me. I wrote msgpack parser using it. See https://github.com/msgpack/msgpack-c/blob/cpp_master/include/msgpack/v2/x3_parse.hpp And other files start with x3_ in the same directory. I guess that mosquitto ACL is easier than msgpack parsing. I just want to comment that writing parser is not a problem. It is a natural and straight forward way.

I will read carefully https://github.com/redboltz/mqtt_cpp/issues/779#issuecomment-768944957 later. I don't read it yet.

kleunen commented 3 years ago

No problem. I am also quite busy recently.

Well, the mosquitto ACL format is easy to parse. You can just read lines and use a tokenizer to read them. One thing I don't like is that it does not have clear "sections".

So you say 'user roger', and all following topics belong to this user:

user roger
topic foo/bar

user wouter
topic yadie

But if I accidently delete 'user wouter', suddenly 'topic yadie' belongs to 'user roger'.

Also, what is missing is to limit the pattern of the client id for a user. For example, if user 'roger' logs in, he can only select a client id starting with 'roger_'. This prevents users from having a client_id collision.

But it is convenient to have the same format as mosquitto, even if you have more options.

JSON is quite verbose, if property_tree is not an option, There is Boost.JSON, but it is only from version 1.75: https://www.boost.org/doc/libs/1_75_0/libs/json/doc/html/index.html

And adding another dependency to the project for a JSON parser, i don't think that is a good idea.

I use property_tree a lot for configuration files, it is convenient. Have a config file parser in just a couple of lines. I actually added parsing the uci format to property_tree: UCI

Or maybe yaml format ? YAML

kleunen commented 3 years ago

If we need to write parser, Boost.Spirit X3 is good choice for me.

Ah yes, Spirit X3 uses EBNF, that is quite easy to write. Seems quite straightforward, especially for small config format.

kleunen commented 3 years ago

Having the configuration in json is convenient, because it is a defacto standard and many tools and libraries exists to create and edit them: https://linuxhint.com/bash_jq_command/

redboltz commented 3 years ago

Sorry to have kept you waiting for a long time. I had busy days but now, I am back.

I'd like to confirm the concept of authentication and authorization. I think that they are separated topic. I use the word "authentication" as CONNECT permission. The broker somehow checks the connecting client has the rights for connection. I use the word "authorization" as PUBLISH/SUBSCRIBE/UNSUBSCRIBE permission. The broker somehow checkes the connected client has the rights for PUBLISH and/or [UN]SUBSCRIBE.

There are out of MQTT spec. So we need to decide our broker's spec.

I'd like to start with authentication because it is earlier phase than authorization.

Authentication

MQTT has Client Identifier(mandatory but empty string has special meaning), Username(optional), and Password(optional). They are in CONNECT packet. In addition, since MQTT v5.0, AUTH packet is introduced. Before MQTT CONNECT packet is sent, we could use client certificate authentication on the TLS layer. Recently, I added an example for client certificate authentication. https://github.com/redboltz/mqtt_cpp/blob/d163f887cfa06a950ec7ba3bfd22279680a567d0/example/tls_both_client_cert.cpp#L347-L367

The spec said that Client Identifier must be unique in the broker.

Each Client connecting to the Server has a unique ClientID.

https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901059

It's very important. If the client that has the same ClientID of the existing connected client, then only one of them is accepted. The typical implementation is disconnecting the existing client and accepting the new client. Our broker behaves so.

ClientID could be empty string. In this case, the broker generates unique string and regard it as the ClientID.

On MQTT v5.0, the generated ClientID is notified the client using 3.2.2.3.7 Assigned Client Identifier property. https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc511988564

On MQTT v3.1.1, there is no way to notify the generated ClientID to the client. http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html#_Toc385349242

Because of this limitation, Clean Session:0 is not allowed with the empty ClientID on MQTT v3.1.1.

If the Client supplies a zero-byte ClientId, the Client MUST also set CleanSession to 1 [MQTT-3.1.3-7].

If the Client supplies a zero-byte ClientId with CleanSession set to 0, the Server MUST respond to the CONNECT Packet with a CONNACK return code 0x02 (Identifier rejected) and then close the Network Connection [MQTT-3.1.3-8].

If the Server rejects the ClientId it MUST respond to the CONNECT Packet with a CONNACK return code 0x02 (Identifier rejected) and then close the Network Connection [MQTT-3.1.3-9].

How to avoid accidental ClientID conflict

We can choose multiple authentication method candidates. The first candidate is a combination of Username and Password. It's simple but has the following problem.

  1. Client1 CONNECT using Username:u1, Password:p1, ClientID:abc
  2. Client2 CONNECT using Username:u2, Password:p2, ClientID:abc

The broker disconnects Client1 because Client2 has the ClientID:abc. Client1 and Client2 could be independent. In this case, they don't know each other. How to avoid ClientID conflict?

Enforce clients to create UUID and set it as the ClientID for the clients? It is difficlut for embedded (only have poor resornces) devices. That kind devices sometimes don't have system clock. And the system clock is usually used creating random UUID.

Enforce clients to set the only empty ClientID? In this case, the broker can create the unique ClientID. However, on MQTT v3.1.1, clients can use Clean Session:1 only. It is unacceptable. One of big advantage of using MQTT is session continuation. Note that the session (subscription status, inflight published messages, and offline published messages) are distinguished by ClientID.

How about using ClientID for authentication instead of Username?

  1. Client1 CONNECT using ClientID:u1, Password:p1, no Username is used.
  2. Client2 CONNECT using ClientID:u2, Password:p2, no Username is used.

It works expectedly.

Any ideas?

kleunen commented 3 years ago

I think it is best to always user the username for authentication. A username/password combination may be used to connect multiple times. Like you said, a unique ClientID can be connected only once. But it is useful to use the username/password combination to connect multiple devices.

You can limit the clientid to a certain pattern in many brokers. So if a user logs in with 'username' the valid clients starts with 'username-', so for example: username-0000, username-0001, username-0002 are all valid client ids for this user. This way, no collision between clientids of different users can occur.

kleunen commented 3 years ago

For embedded devices i would recommend using the MAC Address or some unique microcontroller identifier. Then the clientid would become:

username-device.mac

redboltz commented 3 years ago

Username-ClientID is interesting idea.

Let me clarify what you mean.

  1. Client1 CONNECT using Username:u1, Password:p1, ClientID:abc
  2. Client2 CONNECT using Username:u2, Password:p2, ClientID:abc

In the broker (mqtt_cpp broker), the internal ClientIDs are u1-abc and u2-abc. They are different. So both connections are accepted. Is that right?

kleunen commented 3 years ago

Exactly.

In mosquitto you have patterns like '%u' which will match the username, so you use them to limit the client id. So one can configure how the clientid handling should look like. I don't think they have a generated pattern, but you might define one '%g'.

So: clientid = %u clientid = %g clientid = %u-%g

redboltz commented 3 years ago

Ok. I like unconflictable ClientID management. Let me confirm it. ClientID is a little bit ambiguous. One is the Client Identifier field in the CONNECT packet. It is sent from the client. Let's call it scid (sent ClientID). The other is broker internal identifiler. Let's call it bcid (broker ClientID). bcid is used for the client conflict checking.

Case1

The client send CONNECT { Username:"u1", Password:"p1", Client Identifier:"abc" }. In this case, scid: "abc" bcid: "u1-abc" Only "u1" and "p1" are used for authentication .

Case2

The client send CONNECT { Username:"u1", Password:"p1", Client Identifier:"" }. In this case, scid: "" bcid: "u1-generated_string_by_broker" Only "u1" and "p1" are used for authentication .


So far, it is good for me. I have two questions.

  1. In MQTT v5.0, what string should be returned as Assigned Client Identifier property in CONNACK packet? I think that it should be generated_string_by_broker. Because if the client want to inherit the (generated) session on the re-connection, the client would send CONNECT { Username:"u1", Password:"p1", Client Identifier:"generated_string_by_broker" }. The broker would create bcid as u1-generated_string_by_broker from the CONNECT packet. It is simple behavior. What do you think?
  2. Conformance of MQTT spec. The MQTT spec said that "Each Client connecting to the Server has a unique ClientID". The ClientID means scid. In my example case, the same two ClientID "abc" are accepted. I strongly believe that this behavior is better than disconnecting the existing connection that is authenticated by the different Username and Password. So some MQTT Spec lower might say mqtt_cpp broker violates the spec. I'd like to say it is intentional because it is always better behavior. Do you agree ?
kleunen commented 3 years ago

I dont think it is a good idea to have an internal and external client id. The client id is also used to continue a persistent session on the broker. So the client may reconnect and identify using the client id and request continuation of a stored session. That is also why the previous possibly already broker connection gets disconnected.

redboltz commented 3 years ago

I dont think it is a good idea to have an internal and external client id. The client id is also used to continue a persistent session on the broker. So the client may reconnect and identify using the client id and request continuation of a stored session. That is also why the previous possibly already broker connection gets disconnected.

I guess that you don't understand what I mean. Because my expression is not well...

Username-ClientID is interesting idea.

Let me clarify what you mean.

  1. Client1 CONNECT using Username:u1, Password:p1, ClientID:abc
  2. Client2 CONNECT using Username:u2, Password:p2, ClientID:abc

In the broker (mqtt_cpp broker), the internal ClientIDs are u1-abc and u2-abc. They are different. So both connections are accepted. Is that right?

I think that you have agreed to the idea above. I mean

client Username scid bcid
client1 u1 abc u1-abc
cleint2 u2 abc u2-abc

Do you mean something different?

kleunen commented 3 years ago

Ah ok i understand now. Internally you map the client id to a username. So the client wont see this.

Only thing is that when you reconnect with a different username and like to continue a session. This is not possible. But i dont think that is an issue

redboltz commented 3 years ago

Ah ok i understand now. Internally you map the client id to a username. So the client wont see this.

Only thing is that when you reconnect with a different username and like to continue a session. This is not possible. But i dont think that is an issue

Perfect! That is I wanted to say!

redboltz commented 3 years ago

Let's move on the next topic. It is Client Certificate authentication. Boost Asio has Client Certificate authentication functionality. The server(broker) can extract CNAME of the client. (Maybe can also extract Alternative Domain Names) I think that CNAME (our alt names) should be Username. The Client Certificate is signed by CA. The broker set CA certificate as verify file. So if the Client Certificate signature is verified by the broker, then authentication is success. Important point is Username should be written is the Client Certificate.

In this case, MQTT's Username and Password shouldn't be used. If the client set them, then disconnect as error or outputs warning logs and ignore them.

ClientID (scid) is used for making multiple sessions using the same authentication. It is the same as Username, Password, and ClientID(scid) pattern.

It this OK?

redboltz commented 3 years ago

Maybe it is effected the ACL.

For example, if the user (client) "redboltz" is used Client Certificate then the ACL becomes something like as follows:

{
    "acl": {
        "user": [{
            "roger": {
                "password": "1234"
                         },
                        "redboltz": {
                                "passowrd": "$SYS/ClientCertificate"
                         },
                                "passowrd": "$SYS/ClientCertificate"

means use Client Certificate authentication instead of password authentication. Or

                                "Client Certificate Auht": true

Anyway, it is semantically the same.

kleunen commented 3 years ago

ClientID (scid) is used for making multiple sessions using the same authentication. It is the same as Username, Password, and ClientID(scid) pattern.

It this OK?

Yes. Certificate is alternative for username/password

redboltz commented 3 years ago

Ok. The rest authentication method is AUTH packet. but It is only for MQTT v5.0. And the spec not said about in detail of AUTH packet. So I think that we can postpone it.

Let's move on authorization step by step.

As we discussed, authentication uses Username and Password, and doesn't use ClientID.

I think that authorization should be defined between topic name (or topic filter) and Username.

If Username u1 has read permission of topic name t1, then any clients that have the username u1 (and may have different ClientIDs) can subscribe t1. I think that it is reasonable. ClientID could be automatically generated on runtime, it is not suitable for permission control.

What do you think?

kleunen commented 3 years ago

If Username u1 has read permission of topic name t1, then any clients that have the username u1 (and may have different ClientIDs) can subscribe t1. I think that it is reasonable. ClientID could be automatically generated on runtime, it is not suitable for permission control.

What do you think?

yes, agreed. Authentication of topics is based on username or possibly groups of usernames.

redboltz commented 3 years ago

Ok, so far, authorization is defined as Username and topic relationship.

I've read https://github.com/redboltz/mqtt_cpp/issues/779#issuecomment-768944957 but I'd like to discuss based on my example in order to clarify my understanding.

topic is still ambiguous. MQTT defines Topic Name and Topic Filter.

We can say authorization is Username and Topic Name relashonship. How about Topic Filter?

Let's say there are following Topic Name:

example/a
example/b

They also are Topic Filter

And there is Topic Filter

example/#

Authorization table

Username Topic (Name or Filter) Type Meaning
u1 example/a read(subscribe) u1 can subscribe example/a

In this case, u1 can subscribe example/a but cant subscribe example/b. So far, very simple. What happens u1 subscribe exmaple/# ? If we allow only Topic Name in authorization table, then exmaple/# should be accepted. And then, move on discussion https://github.com/redboltz/mqtt_cpp/issues/779#issuecomment-768944957 .

But before do that, I'd like to consider the following option.

Username Topic (Name or Filter) Type Meaning
u1 example/a read(subscribe) u1 can subscribe example/a
u1 example/# read(subscribe) u1 can subscribe example/#
This is authorization table contains Topic Filter case. And in this case, u1 can subscribe example/# because the (exactly) same topic filter is in the table. If the table is Username Topic (Name or Filter) Type Meaning
u1 example/a read(subscribe) u1 can subscribe example/a

then u1 can't subscribe example/#.

I don't consider deeply yet. If you find some problems in the approach, please let me know.

kleunen commented 3 years ago

If the table is

Username Topic (Name or Filter) Type Meaning u1 example/a read(subscribe) u1 can subscribe example/a then u1 can't subscribe example/#.

I think in this case u1 can subscribe to example/#, but will only receive published messages to example/a (and not example/b).

Because according to mqtt specification, can a subscribe result in a not-authorized response ? If a client subscribes to example/#, is the broker allowed to respond: unauthorized subscription ?

kleunen commented 3 years ago

ah yes, it seems it is:

135 0x87 Not authorized The Client is not authorized to make this subscription.

That does make handling of authorization easier, because you can check on subscription if it is allowed.

But it can be convenient to subscribe to '#' and only receive the topics you are authorized to receive.

redboltz commented 3 years ago

But it can be convenient to subscribe to '#' and only receive the topics you are authorized to receive.

Make sense. Before deciding the options, I'd like to consider another issue.

Can authentication table and/or authorization table be updated on runtime? If we can update them, what happens existing connections and subscriptions?

kleunen commented 3 years ago

I would say: yes they can be updated on runtime. Especially if you start adding a "plugin" based authentication system. For example if the authentication rules come from a SQL database. Clearly, they are dynamic. User might like to add or ban users and update authentication for a user, while the broker is running.

I would say, the updated rules only apply to new sessions (or new connections?). So on startup of a session, the topic authorization is requested for the given username.

redboltz commented 3 years ago

Thank you for the comment. I understand we have some choices. I think that avoiding inconsistent state is important. So far, it seems that there is no problem with both applying existing connections (disconnect from the broker, force unsubscribe by the broker, if updated event can be detected) and applying only new connections.

Of course we need some implementation for updating.

redboltz commented 3 years ago

Let's back to the authorization issue.

Based on our discussion, I considered the following candidates.

Candidate1

Authorization table can contain only Topic Name. Topic Filter is not allowed.

On SUBSCRIBE received {
   if (Topic Filter contains wildcard) {
       accept the subscription.
   }
   else { // Topic Filter is Topic Name
       lookup authorization table(read)
       if (entry exists) {
           accept the subscription.
       }
       else {
           reject the subscription.
       }
   }
}
On PUBLISH received {
   lookup authorization table(write)
   if (entry exists) {
       accept the publish. (and continue)
   }
   else {
       reject the publish.
       return
   }

   get matched deliver target subscribers (already implemented)
   if (matched subscription is wild card) { // I'm not sure it can be checked or not.
       lookup authorization table(read)
       if (entry exists) {
           deliver PUBLISH packet to the subscriber
       }

   }
   else { // no read checking required here
       deliver PUBLISH packet to the subscriber
   }
}

Candidate2

Authorization table can contain only Topic Name and Topic Filter.

On SUBSCRIBE received {
   lookup authorization table(read) // wildcard topic filter is also checked
   if (entry exists) {
       accept the subscription.
   }
   else {
       reject the subscription.
   }
}

On publish is the same as the Candidate 1

What do you think?

kleunen commented 3 years ago

I actually prefer candidate 2 now, because:

  1. The control flow is simpler

  2. The client immediatly gets a subscription denied if it is not authorized, instead of silently accepting the subscription and client is not receiving the published messages because it is not authorized

  3. On publish you don't have to check subscriber authorization, this is already done on subscription. = easier = better performance.

redboltz commented 3 years ago

Let me clarify what does the permission u1 : example/# mean? There are two interpretation.

  1. Give the permission subscribing TopicFilter example/# (exact match string) to u1. But actual delivery is checked using individual permission like u1: example/a. In this case, even if u1 subscribes example/#, published message to the topic example/b is not delivered to u1.
  2. Give the permission subscribing TopicFilter example/# (exact match string) to u1. And all matched publish topics are delivered to u1. This doesn't need authorization (read) checking on publish to deliver message.

When I wrote https://github.com/redboltz/mqtt_cpp/issues/779#issuecomment-846384438, I assumed 1. So I wrote 

On publish is the same as the Candidate 1

It contains on publish authorization (read) check for delivery.

I noticed that we didn't discuss negative list yet.

If you think 2 is better, how to describe treat negative list? For example,

u1 : example/# read (allow) u1 : example/b read (deny)

I'm not sure this negative entry (deny) is acceptable. If it is acceptable, the meaning is

Perhaps negative entry describes like as follows

This means all topics in example/ denied, but topic filter example/# and example/a are allowed. In this case, published message example/b doesn't deliver to u1.

kleunen commented 3 years ago

Good question,

maybe have only two options:

Would that be a good option ?

kleunen commented 3 years ago

Indeed this options becomes very ambigious:

u1 : example/# read (allow) u1 : example/b read (deny)

is example/b allowed or denied ? Is it dependent on order, or does deny override deny, or .. User will probably not fully understand or make mistakes.

Mosquitto seems to specify an order, but it is not so clear:

if this parameter is defined then only the topics listed will have access. Topic access is added with lines of the format:

topic [read|write|readwrite|deny]

The access type is controlled using "read", "write", "readwrite" or "deny". This parameter is optional (unless includes a space character) - if not given then the access is read/write. can contain the + or # wildcards as in subscriptions. The "deny" option can used to explicity deny access to a topic that would otherwise be granted by a broader read/write/readwrite statement. Any "deny" topics are handled before topics that grant read/write access.

redboltz commented 3 years ago

Good question,

maybe have only two options:

  • Deny all, and have individual allows: default deny all allow read # allow read a
  • Allow all, and have individual deny: default allow all deny read # deny read a

Would that be a good option ?

I think that those are good options.

The point is

  1. Define the default entry that is deny or allow.
  2. Define individual entries that is opposite of the default entry.

I think that it is reasonable rule. I said about mosquitto at https://github.com/redboltz/mqtt_cpp/issues/779#issuecomment-755138731 .

I'm not sure mosquitto ACL is good enough. What do you think ?

As you mentioned at https://github.com/redboltz/mqtt_cpp/issues/779#issuecomment-846404621 , mosquitto rule is ambiguous. To be honest, I don't fully understand it.

I think that it is better to define a clearer rule for mqtt_cpp broker. The rule might be incompatible to mosquitto in some points but it is acceptable.

We need to define the meaning of wildcard in the entry.

I think that the following rule is good.

  1. On subscribe checking
    • Check Topic Filter in SUBSCRIBE packet using table (read) . If the result is allow then accept the subscription. If the result is deny then deny the subscription. (default should be considered)
      • If the Topic Filter contains wildcard, accepting subscription doesn't mean the all matched topics are delivered on publish.
  2. On publish checking
    • Check Topic Name in PUBLISH packet using table (write) . If the result is allow then accept the publish. If the result is deny then deny the publish. (default should be considered)
    • On delivery phase, wildcard matched topic is checked individually using the table (read). In the checking phase, wildcard entry is ignored.

Let me show an the example.

default allow all
deny read a

The subscriber can subscribe # because default allow all. But it is just accepted only subscription. The publisher publish to the topic a. Publish itself is succeeded because default allow all. On delivery phase, the topicais matched to#, but the published message is NOT delivered to the subscriber becausedeny read a`.

The publisher publish to the topic b. Publish itself is succeeded because default allow all. On delivery phase, the topicbis matched to#, the published message is delivered to the subscriber becausedefault allow alland nodeny read b` entry exists.

So, wildcard subscription checked twice. The first checking is on subscribe. The second checking is on publish (delivery phase).

What do you think?

kleunen commented 3 years ago

Yes, you are right, but I think this is only for default allow.

If the default is deny: default deny all allow read a

Then no check at publish is needed ? Subscribe to # is denied, subscribe to a is allowed.

default deny all allow read a/#

Subscribe to # is denied, subscribe to a/# or a/b is allowed.

redboltz commented 3 years ago

Yes, you are right, but I think this is only for default allow.

  • Subscription needs to be checked if allowed/denied
  • On publish, check needs to be done if a more specific rule allow/denies the message

If the default is deny: default deny all allow read a

Then no check at publish is needed ?

I think so if you mean on publish delivery phase checking.

Subscribe to # is denied, subscribe to a is allowed.

Yes.

default deny all allow read a/#

Subscribe to # is denied, subscribe to a/# or a/b is allowed.

Yes. But a publisher publishes to the topic a/b at the delivery phase, the message is NOT delivered. In order to deliver the message allow read a/b is also required.


By the way, default is for both read and write. I'm not sure but if read and write can set individually, it might be useful.

Something like as follows:

default read deny all default write allow all allow read a/b deny write a/b

Semantic is

What do you think ?

kleunen commented 3 years ago

Yes. But a publisher publishes to the topic a/b at the delivery phase, the message is NOT delivered. In order to deliver the message allow read a/b is also required.

I understand but i think that this is not very practical. If you have for example a multi tenant broker, a broker used by many companies, you might whitelist a path for a company. Say you want to give a company a share of the broker, you might allow them to publish and subscribe under company.com/#. But you do not know which topics they want to create

redboltz commented 3 years ago

Yes. But a publisher publishes to the topic a/b at the delivery phase, the message is NOT delivered. In order to deliver the message allow read a/b is also required.

I understand but i think that this is not very practical. If you have for example a multi tenant broker, a broker used by many companies, you might whitelist a path for a company. Say you want to give a company a share of the broker, you might allow them to publish and subscribe under company.com/#. But you do not know which topics they want to create

Good point. I understand that your idea defines wildcard in the rule as follows:

default deny all
allow read company.com/#

On subscribe, company.com/# in the rule table means applying the rule (allow or deny) to any topics filter under company.com/ including wildcard. So the following subscriptions are accepted.

company.com/#
company.com/a
company.com/b
company.com/+/c

In addition, there is no check on deliver phase.

Am I understanding correctly ?

kleunen commented 3 years ago

Yes exactly

kleunen commented 3 years ago

And for default rule, i would say explicit set for read and write. Not for both at same time. But is minor thing

kleunen commented 3 years ago

Although

default deny all allow read company.com/# allow write company.com/#

Is pretty clear

redboltz commented 3 years ago

I'm still not sure a little. I think that default is the same as #.

default deny all allow read company.com/# allow write company.com/#

So the rule above is the same as

deny read # deny write # allow read company.com/# allow write company.com/#

Is this right?

kleunen commented 3 years ago

Yes exactly. But default only set once and to #.

default deny allow read company.com/# allow write company.com/#

redboltz commented 3 years ago

We (at least I) omitted Username in the rule intentionally but I assume that the rule is for one user.

My model is as follows:

Is this the same as you ?

Or, default is more globally applied ?

kleunen commented 3 years ago

Per username is flexible, but maybe a bit too flexible? Probably all users will get same default. But can be useful to configure per user?

If global default is deny all

default deny all

user root allow read # allow write #

redboltz commented 3 years ago

Your model is as follows ?