w3c / wot-profile

Web of Things (WoT) Profile
http://w3c.github.io/wot-profile/
Other
16 stars 8 forks source link

subscribeallevents security requirements #224

Open egekorkan opened 2 years ago

egekorkan commented 2 years ago

subscribeallevents section of WebHook Profile has the following text:

If a Web Thing receives an HTTP request following the format above, then it MUST send event messages to the Consumer for all event types for which it has permission to subscribe.

How does the Thing know which events the Consumer has permission to? Does the Thing have a list of events and allowed Consumers per event? This is a kind of coupling that is very anti REST (at least as far as I know). Furthermore, this is against the design behind TD where there is no concept of ids of consumers, any Consumer that satisfy the security requirements of that form (actually the entire TD since per form security is not permitted in all profiles).

If profile is to follow the TD spec, if a Consumer is allowed to execute the subscribeallevents operation, it should get all event notifications.

mlagally commented 2 years ago

How does the Thing know which events the Consumer has permission to? Does the Thing have a list of events and allowed Consumers per event?

The permissions may be different for different authenticated users. Say you have two groups, admin and regular users, each with their own credentials and permissions. Admins may receive events that are not available for regular users (or vice versa).

The permission groups are managed by the thing but not described in the TD.

egekorkan commented 2 years ago

The permission groups are managed by the thing but not described in the TD.

This against the design principles behind TD. If a Consumer can execute the subscribeallevents operation, it should get all the events.

benfrancis commented 2 years ago

I think it's OK for different clients to receive different responses to HTTP requests from a server depending on the authentication credentials they provide (e.g. a JSON Web Token). We use similar text for the readallproperties protocol binding:

If a Web Thing receives an HTTP request following the format above, then upon successfully reading the values of all the readable properties to which the Consumer has permission to access, it MUST send an HTTP response with...

I don't know what the best practices are for authentication/authorisation of WebHook subscriptions, but it seems reasonable to extend the same principle there. The user can be identified via the security mechanism used by the subscribeallevents request.

See also: #220 and #221.

Note that a Consumer could have multiple users, and it's really the user who has permission not the Consumer as such. We could update any assertions which refer to "permission" to reflect that, since different requests from the same Consumer may come from different users with different permissions.

Relatedly, I was confused by @mlagally's response to my code review of #192 where he said:

We should discuss if multiple subscription from the same consumer to the same event make sense. Actually I think we should just behave as if there is a single subscription.

Surely a single Consumer has to be able to support multiple subscriptions? That does seem to be supported by the unsubscribeevent and unsubscribeallevents operations by providing a subscriptionID, although the payload of those requests doesn't seem to have been defined yet.

egekorkan commented 2 years ago

Regarding access points above:

I think it's OK for different clients to receive different responses to HTTP requests from a server depending on the authentication credentials they provide (e.g. a JSON Web Token). We use similar text for the readallproperties protocol binding:

My argument would apply to readallproperties as well. I am not sure if we can talk about this partial authorization in the current state of the TD spec since it says:

Identifies the readallproperties operation on a Thing to retrieve the data of all Properties in a single interaction.

This might be open for interpretation but all operations can be fully executed if the provided authentication is correct. This partial return idea is not specified (which might be a good thing for flexibility?)

benfrancis commented 2 years ago

This partial return idea is not specified (which might be a good thing for flexibility?)

Whether we specify it or not, I think in practice it's inevitable that Things are likely to provide different levels of access to Consumers which provide different security credentials, since this is so common in HTTP and a core feature of the some of the TD security schemes.

For example, WebThings Gateway already provides a feature to limit "monitor" or "control" access to specific devices for a Consumer providing a given JSON Web Token (which can also be requested via OAuth2). A reasonable extension of that would be to limit access to individual properties. If access can be limited to individual properties then it would make sense to limit access to individual events.

A real world example of this is that there's a university building I'm currently working with which has sensor arrays in each room. The data from many of these sensors (e.g. temperature and humidity) is publicly accessible, but professors requested that occupancy data not be made public, since it shows how often they are in their office!

I agree that leaving this behaviour undefined is probably a good thing for flexibility, because it allows it to be implementation specific.

In conclusion, I think it's fine for readallproperties and subscribeallevents to have different responses based on the security credentials provided in the request, and I don't see any problem at hinting at that in the specification using the phrase "for which it has permission". But we could remove that language altogether if you think it's confusing that a mechanism for determining permission is not explicitly defined.

egekorkan commented 2 years ago

I think that we can simply define this in the TD spec in some way. An additional explanation/assertion at https://w3c.github.io/wot-thing-description/#behavior-security would be nice for all meta operations (operations that affect many affordances).

I think we can solve this in a somewhat declarative way. Saying that readmultiple is possible if you have no credentials which in turn means you will get only temperature and humidity (in the above example) and readall needs a higher level access since that would return occupancy as well. My proposal is a bit problematic because:

With the current proposal, we are breaking the TD conformance a bit since a single form and its operation with a single security definition should result in the same behavior no matter the Consumer. In the current proposal, we actually have multiple security definitions but there is no way to map this to meta operations. This is a limitation of the TD spec I must :/ In the meantime, putting this as an explanation somewhere in the TD is definitely needed, i.e. saying something like:

Executing a meta operation (such as readallproperties, subscribeallevents, etc.) does not necessarily provide values of all affordances since that can depend on the supplied security credential. A way to describe this difference in behavior is in works.

benfrancis commented 2 years ago

@egekorkan wrote:

Executing a meta operation (such as readallproperties, subscribeallevents, etc.) does not necessarily provide values of all affordances since that can depend on the supplied security credential.

This could be fine on its own.

A way to describe this difference in behavior is in works.

I see three different possible approaches:

  1. Show all affordances to all users with the same security metadata (The simplest, static Thing Description with the greatest flexibility for permissions on the server side, but some users will get forbidden errors for some affordances)
  2. Show all affordances to all users but with alternative security schemes (A static Thing Description with some flexibility, but it's very limited, e.g. nosecurity scheme as an option for some affordances, others require authentication)
  3. Show different affordances depending on which user requests the Thing Description (Requires a dynamically generated Thing Description)

I'd be fine with any of these, but I think the first option is most likely to be used in practice because:

  1. Developers seem to prefer static Thing Descriptions
  2. Users want to be able to change permissions dynamically on the server side as users get re-assigned or new automated systems are put in place, and different users may have access to different affordances

Note that currently section 6.3 Security Schemes only allows security metadata at the top level of the Thing Description, not for individual interaction affordances, but I would be fine with removing this constraint.

I suggest this issue could be resolved by just adding the first sentence you proposed as a note in the Thing Description specification.

mmccool commented 1 year ago

Discussed in Security TF 2023.02.13:

mmccool commented 1 year ago

Discussion in Security TF:

We did some searches for some current practice and came across these:

which might be useful. In general though we need to look into current practices more. This will take time, which is unfortunate, but the topic is important. Here is another reference that may be useful: https://webhooks.fyi/learn-more/standards

Ideally we would just adopt a tested standard for this but CloudEvents seems to focus on payload format and event description, not security. There is this however, which also points out that an events mechanisms needs also to prevent abuse like DDoS: https://github.com/cloudevents/spec/blob/v1.0.2/cloudevents/http-webhook.md#4-abuse-protection. In short, the consumer needs to be authenticated and prove they own the webhook endpoint they are providing to prevent malicious people from using webhooks to reflect and amplify network traffic onto a target of a DDoS attack. Of course there are other issues...

The OpenID SSF standard (https://openid.net/wg/sharedsignals/) looks promising, and directly addresses security. But we would still need time to review it, and as it's (relatively) new implementations would be (may be?) sparse. It's also rather complicated and constrains API and payload design (see below).

Kaz suggested we reach out to TAG and ask for their input on how to deal with this. We can piggyback this on the current CR review request.

mmccool commented 1 year ago

Reading up on OpenID SSF, and the following look useful:

Note that OpenID re-used the SSE acronym for historical reasons, but this is a DIFFERENT SSE than the W3C standard. Not confusing or anything. At any rate, please use "SSF" to refer to it to avoid confusion, even though "SSE" still appears in various places, like in the URL for spec... (SSE is actually the (old) name of the group, so...).

It seems SSF depends in turn on an IETF RFC to sign messages. The article above refers to a draft but it has since reached full RFC status: https://datatracker.ietf.org/doc/html/rfc8935. Colloquially this is called "SET Push". It also uses the following, which is however still in draft: https://datatracker.ietf.org/doc/html/draft-ietf-secevent-subject-identifiers

In summary, though, SSF has specific requirements for payload format and API support, and also depends on the use of OAuth2 (not a huge problem, as long as client flow is permitted, but we'd have to see if the entire thing actually works for IoT devices or has some human-in-the-loop assumptions, and it probably means "simple" implementations using Basic might not be permitted).

Also, there are implementations of SSF but they seem to be for very security-conscious applications, like account management and risk notifications.

mmccool commented 1 year ago

Regarding CloudEvents, this is the entirety of the security and privacy section of that spec. Basically it says "security is out of scope but you should probably do it somehow".

Cloud Events

Security and Privacy

Interoperability is the primary driver behind this specification, enabling such behavior requires some information to be made available in the clear resulting in the potential for information leakage.

Consider the following to prevent inadvertent leakage especially when leveraging 3rd party platforms and communication networks:

Context Attributes

Sensitive information SHOULD NOT be carried or represented in context attributes.

CloudEvent producers, consumers, and intermediaries MAY introspect and log context attributes.

Data

Domain specific event data SHOULD be encrypted to restrict visibility to trusted parties. The mechanism employed for such encryption is an agreement between producers and consumers and thus outside the scope of this specification.

Protocol Bindings

Protocol level security SHOULD be employed to ensure the trusted and secure exchange of CloudEvents.

mmccool commented 1 year ago

Actually, there is another CloudEvents spec that includes more security information (in fact, it is almost entirely about security): https://github.com/cloudevents/spec/blob/v1.0.2/cloudevents/http-webhook.md. See in particular the sections on authentication and abuse prevention.

The "bearer token" approach sounds similar to what I was thinking would be a reasonable solution above. It may not be as complete as the SSF approach but if there is a pattern and implementation experience in the CloudEvents ecosystem it might be a reasonable practice to follow.

Specifically, I think we should adopt most of this spec, but only allow the "bearer" token authentication scheme. It's not terribly well described in this doc, but my understanding is that upon subscription the consumer would provide a token (in the payload?) that would then be used in responses (webhook calls) to authenticate the actual callback. This can be done independently of whatever the original authentication mechanism on the Thing's subscription affordance was, and the contents of the bearer token could be up to the consumer. We could also insist upon mutual TLS when the event messages are used outside of a LAN (or a VPN emulating a LAN). Since both sides would have publicly-visible URLs in that case, which should both use TLS and already have certs, this is not that unreasonable.

mmccool commented 1 year ago

Apologies if this is not the best issue in which to capture these thoughts. But we agreed to use it in the Security call (when we were in a rush at the end of the meeting) so let's continue here. We can reorganize content or summarize in other issues when the input has been gathered.

mmccool commented 1 year ago

Some general resources that discuss webhook security, but also survey systems (which we can use to see what current practice is) and suggest best practices:

Note however that many of the links in the above are old or broken, although the sites as a whole are being maintained - the disadvantage of collecting links over a long period of time.

egekorkan commented 1 year ago

There are many comments here but something is bugging me and we need to sort this out:

Letting the "all" apply only to what the particular Consumer has authorization for makes sense. What else are you going to do?

This has to be described, one way or another. There can be different security schemes in a TD and the highest level one allows readallproperties. So, the TD MUST look something like this:

{
    // ...
    "securityDefinitions": {
        "basic_sc1": {"scheme": "basic"},
        "basic_sc2": {"scheme": "basic"}
    },
    "security": "basic_sc1",
    // ...
    "properties": {
        "status": {
            // ...
            "forms": [{
                "href": "https://mylamp.example.com/status"
               // implicitly this has basic_sc1
            }]
        },
        "status2": {
            // ...
            "forms": [{
                "href": "https://mylamp.example.com/status2"
               // implicitly this has basic_sc1
            }]
        },
        "status3": {
            // ...
            "forms": [{
                "href": "https://mylamp.example.com/status3"
               // implicitly this has basic_sc1
            }]
        }
    },
    "forms": [{
        "op": "readallproperties",
        "href": "https://mylamp.example.com/properties",
        "contentType": "application/json",
        "security":"basic_sc2"
    }
}

The security mechanisms apply to operations on the resource. So if the consumer can fulfill the basic_sc2 scheme, it should read all properties.

mmccool commented 1 year ago

Since many of the above comments actually apply to https://github.com/w3c/wot-profile/issues/222, e.g. those relating to Web Hook security requirements, I want to focus in this comment on the initial issue raised here, i.e. how subscribeallevents can respond to different levels of authentication, and how these can (or should) be described in the TD. This comment also captures discussion from the Security TF call on 27 Feb 2023.

Summary of above:

Comments:

mmccool commented 1 year ago

Please direct other Web Hook discussion to #222.

lu-zero commented 1 year ago

We could use multiple security schemes, but using this approach is indirect and not really what they are intended for.

The security and scope fields right now are the only way to group forms in the affordances so a consumer would know what to expect from the all-ops from it.

Until the op field could hold a map operation: { security, scope, verb, ... } it would require to duplicate the Form per-operation in the affordances. This is one of the items I'd like to work on for TD 2.0 though.

benfrancis commented 1 year ago

Recommend that we DO allow Things to return different results based on the level of access of the Consumer.

+1

it would be useful to document the different levels of access in the TD (Ege)

I don't think this is really practical. There could be a large number of users with dynamically allocated access permissions maintained by a central authorization service. Each access token (e.g. a Bearer token) could have a unique combination of access permissions to different operations on different affordances of different Things. In order to express this level of granularity in Thing Descriptions you might need a hundred Forms per interaction affordance.

I think Consumers need to assume that just because an operation is described in a Thing Description doesn't mean it's available to all authenticated users, authentication and authorisation being two separate things. If a Consumer tries to perform an operation to which it doesn't have access then it should receive a Forbidden response. Meta operations should aggregate the operations to which the requesting user has access (e.g. only subscribe to the events to which the requesting user has access).

The recommendation would be that when accounts are created, they are associated with particular (sets of) scopes that they should have access to.

Permissions need to be modifiable after a user account is created. With a username and password type authentication scheme permissions may need to be queried from a database on every request (where there are different permissions for different users). In some cases tokens can reduce the need to query a database by acting as a stateless authorization mechanism where access permissions are encoded in the token and valid until the token expires. New permissions can be encoded in a new token issued to a given user.