RS validation of access token

Introduction

This issue is based on a formal security analysis of the GNAP specification, which we are working on as a group in the Institute for Information Security at the University of Stuttgart, extending previous analysis work done by Florian Helmschmidt (pq2).

The security property we have been working to prove that is relevant to this issue is session integrity for authorization. Intuitively, this session integrity property states that an honest (protocol-following) end user should only access resources that it owns (or resources for which access is controlled by a dishonest AS, as such an AS is not constrained by the protocol to verify ownership of resources before granting access). We found that we were unable to prove that GNAP satisfies this property due to missing details about the behaviour of the resource server, and in particular, on token introspection.

Note that a similar problem can occur with structured access tokens as well, even in the absence of introspection. As introspection is already fully specified, we focus the description of the issue on introspection rather than the case of structured access tokens.

Section 3.3 describes the process of token introspection, but does not currently make clear how the RS identifies which AS to send an introspection request to. Because the RS only has available the information sent to it in a resource request from a client, and the only piece of this request that is specified by the core GNAP protocol is the access token, a natural assumption is that the access token contains information indicating which AS the RS should communicate with.

Under this assumption (that the RS selects an AS to perform token introspection at based on information stored in the access token), the following flow is possible:

Problem Flow

si-authz-attack

In this flow, an honest end user starts a flow at an honest client instance and an attacker AS (AAS) for one of its own resources r. From the perspective of the end user and the CI the flow looks normal. Meanwhile (1), the attacker starts a flow for another resource r' at an honest AS (HAS) and receives a valid access token. (r' could be a resource owned by the attacker, for example) When the honest CI requests an access token from AAS, the attacker just forwards the token it received from HAS. Since access tokens are opaque to the CI, it doesn't notice anything wrong at this point (2). The honest CI uses this access token at an honest RS. The RS looks inside the token and does token introspection at the endpoint specified in the token, hence at HAS (3). Since HAS indeed issued the token, it replies with access rights for resource r'. And hence, the RS sends back r' which finally is sent to the end user.

Intuitively, this flow should not be allowed, as the end user accesses a resource for which access is controlled by HAS, even though the end user only interacted with AAS. This violates the intuitive notion of session integrity for authorization described above, as it allows an honest (protocol-compliant) end user to access a resource belonging to another end user, for which access is controlled by an honest AS.

One possible resolution to this is to only consider session integrity for grants which are begun at an honest AS, a weaker property than the intuitive one described above.

However, with some minor changes to the protocol, we are able to show the stronger session integrity property that we initially describe, which holds even if grants are begun at a dishonest AS. In particular, the flow described above becomes impossible with these changes.

Proposed Resolutions

We propose two potential sets of changes, each of which makes the above flow impossible:

RS-side validation
1. In a resource request, the client should send not only the access token, but also information (such as a domain or endpoint URI) about the AS at which the client requested the access token.
2. The RS should then perform introspection at the AS identified by the client instance.
3. Upon receiving a response from the AS, the RS should check that the AS is responsible for managing access to the resource(s) that it provided access rights to in its response.
CI-side validation
1. Access tokens should contain sufficient information to identify the AS that issued them to either a CI or an RS.
2. A CI, upon receiving an access token from an AS, should validate that the token was issued by the AS at which it requested the token. Note that this means that the token is not fully opaque to the CI.
3. The RS should then perform introspection at the AS identified by the access token.
4. Upon receiving a response from the AS, the RS should check that the AS is responsible for managing access to the resource(s) that it provided access rights to in its response.

Why does this fix the problem?

In both cases: The RS only performs introspection at the AS at which the client requested the access token (in RS-side validation, this is immediate, and in CI-side validation, this follows from the check in step i.)

Since the RS only performs introspection at the AS at which the client requested the access token, and only returns resources to the client if this AS is responsible for controlling access to them, either the AS is honest and any resources the client receives are correct, or the AS is dishonest, and the RS will only return resources for which access is managed by the dishonest AS. As such, if the end user receives resources for which access is controlled by an honest AS from an honest CI, those resources belong to that same end user.

Below, we describe how the flow shown above is ruled out by these changes.

RS-side validation:

si-authz-rs-side

Notice that the honest CI in its resource request (according to 1. i) sends the token together with an identifier of AAS, since this is the AS it requested the token at, and that the RS (according to 1. ii) is now doing token introspection at AAS (instead of HAS as before). The incorrect flow is detected by the checks done by the RS in step 1.iii.

CI-side validation:

si-authz-ci-side

Now HAS includes an identifier of itself in the access token it sends to the attacker CI (step 2. 0). Here, the incorrect flow is detected by the checks done by the CI in step 2. i.

If AAS changes this field before forwarding the access token to the honest CI to include an identifier of AAS, then the checks in 2. i performed by the CI pass and the flow continues as in RS-side validation. The RS will do token introspection at HAS (step 2. ii) and when it receives a response from HAS will notice due to the checks of step 2. iii that something is wrong.

Conclusion

We believe that the specification for how resources servers choose what AS to perform introspection at, as well as what validation resource servers perform on introspection responses, is insufficiently precise and leads to potential security issues. We propose two potential sets of changes to resolve this issue, which arose while trying to prove security properties of the GNAP protocol. With either of these changes, we are able to show a strong session integrity property which applies even in the case of grants started at dishonest AS's. In the current version of the specification, however, this property does not hold (as shown by the first example flow shown above).

Thanks for this -- you're right that there's a lot in the draft that's still very drafty. That said, I don't think that this is an actual security concern or something that requires additional mitigations.

In practice, most RS's are going to be configured to talk to only one AS. So the RS will always talk to HAS to see if the token is valid. There are some other options in the wild, such as the RS looking at a field inside the access token itself (and then checking against an allowlist), or the RS being configured with different AS's depending on which resource is being asked for (ie, everthing under /foo goes to one AS and everything under /bar goes to another). There are likely other methods I'm not thinking of here.

Which brings us to the attack as described above:

From my read, this seems to be the access token acting exactly as intended, and what you're describing is a willing proxy system.

If the attack presumes a bearer token, the attacker's AAS/CI component is getting a legitimate access token and then willingly choosing to hand that token to another party for use. For a bearer token, this is in fact a desired feature: the bearer of the token has the right to hand that token over to another party to allow that party to use it, since that party can now bear the token.

GNAP makes this more difficult in the case of bound tokens, which I think also stops the attack as described above. In this case, the honest CI would use its own key in the request to AAS, but the AAS/CI piece wouldn't have access to that key to call HAS. The token issued would be bound to the key from AAS/CI, and so when that token is returned to the honest CI, the honest CI wouldn't be able to use it because it would not have access to the key material. Now it's possible for the AAS to return a key to the honest CI along side the token, but in doing so, the AAS is willingly passing along all information that would be needed to use the access token to another party. It's not being tricked into doing so.

In all of these cases, the RS is doing exactly what it was configured to do: see if the token is valid (it is) with its trusted AS (HAS) and that the token was presented with the right material. What's weird is that the honest CI is being tricked to use AAS to get tokens to talk to the honest RS. It begs the question, how did that connection get made in the honest CI? The discovery methods in GNAP core help this significantly, for dynamically configured clients. But from the RS's perspective, the RS doesn't care :how: the honest CI got its tokens, only that the tokens it got were good. There's actually no requirement that the honest CI go through a GNAP process to get a token -- it could be generated statically and configured in code, for instance. All that the RS cares about is whether the token is good for the call being made. In this case, the client was able to get a token that is good, and so it :should: work.

I haven't done a deep review, but I also believe that both the mitigations presented make the overall situation worse in several ways. First, if the client is able to signal to the RS which AS to use, couldn't this lead to a case where a malicious client could try to force a poorly-implemented RS to make external calls to untrusted parties, such as AAS? Second, if the client is expected to inspect the token, we would have to presume a structure or content to the token itself, which precludes privacy-preserving technologies like reference tokens and encrypted tokens from being used here. It would also lock GNAP into a token format, and as discussed extensively, keeping the format of the token opaque to the client instance (and opaque to the overall token-issuance and token-use portions of the protocol) is a hugely important feature to the ecosystem as a whole. On top of that, we know from over a decade of deployment experience that expecting clients to enforce security policy is an ultimately losing game.

From what I can see here, the considerations are about how the honest CI makes the connection of AAS+RS for its tokens, and that's a discussion of discovery for the core document. The other half is how the RS knows where to introspect its token, and that's a discussion to be added to this document as discussed above.

We would like to clarify why we see this flow as a problem and revise the mitigation we suggest to address your concerns

While the attacker is willingly giving access to the resource in the problem flow we describe, we note that an attacker is by nature always a willing participant in attacks, and we see the underlying problem as being not that the attacker's resources are accessed without consent of the attacker, but that the honest end user (via the honest client) accesses resources of the attacker without consent of the end user.

It is possible for the attacker to reach the same effect as this attack, even in the presence of mitigation, by fully acting as a MITM, either granting read access to some resource it controls directly, or granting write access to some resource it controls and then replaying whatever information the honest user writes. It is perhaps a more philosophical than practical question whether we should care about the effects of attacks or about how they are executed --- while the mitigations we propose do not prevent all attacks of this nature, they do restrict those attacks further.(e.g. the attacker needs to be the manager of some resources on an RS that the client uses, whereas without the mitigation, all that the attacker needs is to be the owner of some resource governed by an honest AS on an RS that the client uses). As such, we believe that the mitigations we suggest are beneficial in that they increase the number of trust relationships that need to be broken in order for the attack to succeed. To be precise, in the flow we present, the only instances where an attacker is mistakenly trusted by an honest party are the client trusting the attacker as an AS and HAS trusting the attacker as a client. In order to replicate this attack with our mitigations in place, we need an additional failure of trust. If, for instance, the attacker gives the client access to a resource stored at an honest RS in order to replay the client's interactions at a resource controlled by HAS, then the RS needs to trust AAS as an AS that is allowed to manage resources.

We agree with the problem with CI-side validation, but believe that the problem you raise with RS-side validation can be avoided. If the RS already has some means of choosing which AS to perform introspection at, such as one of the cases you describe in your response, rather than performing introspection at the AS suggested by the client, the RS can instead just check the suggested AS against the AS it would choose. This still allows the RS to detect this problem flow, without requiring it to call any party that it does not already trust. This approach seems to avoid both the problem flow we describe and the concern you have with the original mitigation suggestion.

Again, this is an attack only against bearer tokens, and in this case it's a case of bearer tokens working exactly as intended. I still don't believe that the attack is valid against the trust model that GNAP is based on. The RS does not care how the client got the token or who it got the token from. The RS does care about the token itself being valid. The RS's job is not to protect the client instance or the end user, the RS's job is to protect the resource. The crux of the failure shown here is in HAS giving access to the attacker's chosen client, AAS/CI. But in that case, the attacker has OK'd this request, and the AS has allowed it per the attacker's request. The other failure is that the honest CI is tricked into using the attacker's AAS instead of HAS, which is a discovery and configuration problem for the honest CI. Most of the time, the honest CI is going to be hardwired to talk to HAS directly as the only AS it knows how to speak to for RS. And GNAP core has discovery mechanisms that allow a more cold-boot process, where the honest CI learns of the URL for HAS from the RS -- in which case, the attack is bypassed entirely because the client never talks to AAS.

Additionally: This might even not be an attack, in the end: if the "AAS/CI" component is actually a system gateway that is configured to fetch tokens on behalf of the honest CI, then the honest CI is doing exactly what it's supposed to do: talk to the gateway to get a token, which in turn is provided by an internal component that only AAS/CI has access to but the honest CI does not. This type of token agent is not a new pattern.

believe that the problem you raise with RS-side validation can be avoided.

The proposed mitigation discounts the fact that in the overwhelming majority of cases, the RS has no choice in which AS it will check the token against -- that is to say, the RS is configured to talk only to the AS that it trusts. GNAP does not presume an open or runtime-configurable association between RS and AS, even though such a thing is allowed. And even in this case, the RS is going to need to know from the token or the resource which AS to call, not from an additional parameter provided by the client.

And again, this attack is completely stopped by using key-bound access tokens, since the honest CI will present a different key to AAS than what AAS/CI can present to HAS. Therefore, the token is bound to the key from AAS/CI, which honest CI doesn't have access to. The only way for that to be breached is the attacker willingly giving up secrets, as it would be doing with a bearer token. This is not an attack against the protocol.

In conclusion, I believe that the attack, as described, is not an attack, and that the proposed mitigation is, therefore, not applicable. Discussion of this proxying scenario and its implications would be welcomed in the specification, but, speaking as an individual, I do not believe this to be an attack on the system as described.

Another thing I just thought of:

What's to stop AAS from responding positively to the introspection request from the RS? If the RS is going to trust AAS to issue tokens, then AAS can claim that any token -- including a token that it did not issue -- came from AAS. With this, I don't believe the attack is stopped by your RS-side check mitigation above.

It seems like there may be some differences between our mental model of the protocol flow of GNAP and yours. Maybe the answer to the following question will help to clarify that.

The GNAP core specification, in Section 3.1, describes the continue field of a grant response, which always includes a uri subfield. Is this URI intended to always belong to the AS sending the grant response, or could it belong to a different AS?

The intent is for that URI to be to the same AS. It does not have to be the same URI as the grant endpoint, but whatever is serving the continuation API needs to be able to tell which grant is being referenced. Additionally, it needs to be able to accept and verify the client's key -- the token presented to the continuation endpoint is always bound to a key and is never a bearer token. As a consequence, giving a continuation URL at a different AS should not ever work because it is always bound to the key of the client making the request. Does this change how you are viewing the model?

Does this change how you are viewing the model?

Thank you, this clarified our understanding of the model.

A further high-level question about the security model for GNAP: Do you see it as a problematic outcome that an end user is able to authorize resource access at one AS, and then receive a resource managed by a different AS within the same flow?

I do not see that as a problem per se, and in fact there are plenty of proxy or API-gateway scenarios where that might actually be the intended outcome. I think it's probably worth calling this out as a potentially surprising combination of events, but I don't think it makes sense to call it an attack on the protocol.

The client instance here calls an AS and gets a token that works for its target resource. The RS receives a token that's valid for the target resource. All of that is in line with the intended outcomes.

I also want to point out the existing security considerations for this attack discussed here: https://www.ietf.org/archive/id/draft-ietf-gnap-core-protocol-11.html#name-stolen-token-replay

@jricher Having finally read through this issue, I agree with you that this is not a valid attack (which is a pity, because it means that the formal model that was proposed is not tight). That said, I would suggest adding to the RS draft a section 3.1 (NOT a security considerations section) on AS Selection, saying that this part is security-critical, and that only two methods are allowed:

Static AS configured to the RS, possibly depending on the resource URI
An AS included in the access token (should we have a standard claim for that?) AND filtered by an allowlist.

After a discussion with @jricher at the OSW, we came to the conclusion that the flow described in this issue is indeed possible (i.e., the formal model is correct in that respect). We also discussed whether this flow is a problem/attack or not - which boils down to the question of "what security properties is GNAP expected to meet?", specifically the mentioned Session Integrity class of properties. There, the core result was that GNAP is not expected to meet the strong property initially proposed here, but a slightly weaker one which assumes that the AS that the client instance initially contacts is honest (in addition to the AS issuing the actual token being honest, those two are not necessarily the same).

To cite @jricher from the mailing list:

This issue was discussed today at the OAuth Security Workshop, I was in attendance along with three of the researchers who had proposed the issue into GitHub. Fundamentally, we came to the understanding that the security property being modeled under attack was not fundamental to GNAP, but it does warrant a reasonable set of warnings for implementors to avoid accidentally stumbling into it.

The proposed outcome is to write or clarify three main points:

Call out more clearly that a token with an AS-provided key has some of the same attack surfaces as a bearer token. While such a token could not be used directly by the RS, it can be captured and replayed to an unwitting client application by an attacker, with the key intact. Warnings against using and accepting AS-provided keys will be written up. Currently the feature will remain in the core specification with these warnings in the RS draft, though this may warrant a specific cross reference added to core.

As proposed in the issue below by Yaron, a new section will be written in the RS draft describing the importance of AS choice for token validation and acceptance.

A new security consideration section in the RS draft that discusses the token substitution attack, the circumstances under which it can occur, and the tradeoffs for those circumstances.

Given these additions, this issue is resolved from our point of view.

ietf-wg-gnap / gnap-resource-servers