oauth-wg / oauth-v2-1

OAuth 2.1 is a consolidation of the core OAuth 2.0 specs
https://oauth.net/2.1/
Other
51 stars 27 forks source link

Describe OAuth's use of application/x-www-form-urlencoded encoding #128

Closed panva closed 4 months ago

panva commented 2 years ago

499a8f88b70ca00bfa5ef588023c68c1d0277bff removed Appendix B but unlinked references to it are still within the document next to almost every x-www-form-urlencoded mention.


Throughout the years this Appendix was THE resource used over and over to explain why and how the OAuth use of the Basic authorization scheme encodes the username and password tokens.

If this Appendix is removed I would propose to add examples of client secret basic with username and password tokens where the client_id and client_secret encoding changes the octets that go into the basic authorization scheme base64 encoding. This is an often overlooked implementation detail that both client and server implementers get wrong and end up inoperable, further driving users to use client secret post which this document marks as NOT RECOMMENDED.

aaronpk commented 1 year ago

Does OAuth do something different than what's describe in the HTTP Basic Authentication Scheme spec?

https://datatracker.ietf.org/doc/html/rfc7617#section-2

panva commented 1 year ago

Yes.

The client identifier is encoded using the
"application/x-www-form-urlencoded" encoding algorithm per
Appendix B, and the encoded value is used as the username; the client
password is encoded using the same algorithm and used as the
password.

The client_id and client_secret are encoded before being passed to HTTP Basic as the username and password tokens.

aaronpk commented 1 year ago

I'm going to need someone who knows more about string encodings than me to write this section. I think it would be useful to have a new appendix that describes this encoding, but we can update it to avoid mentioning the out of date info in the 6749 appendix B.

aaronpk commented 1 year ago

In the mean time, there is a placeholder for this section and the internal references to it have been fixed

https://drafts.oauth.net/oauth-v2-1/draft-ietf-oauth-v2-1.html#appendix-B

adeinega commented 1 year ago

String encoding does what it does, it escapes certain characters in the input string. Below are some JS examples with the intent to show the differences

> console.log("Authorization: Basic " + btoa("client:secret"))
< Authorization: Basic Y2xpZW50OnNlY3JldA==
>
> console.log("Authorization: Basic " + btoa(encodeURIComponent("client") + ":" + encodeURIComponent("secret")))
< Authorization: Basic Y2xpZW50OnNlY3JldA==

in both cases, we get the same results

now, let's use "secret=" instead of "secret"

> console.log("Authorization: Basic " + btoa("client:secret="))
< Authorization: Basic Y2xpZW50OnNlY3JldD0=
>
> console.log("Authorization: Basic " + btoa(encodeURIComponent("client") + ":" + encodeURIComponent("secret=")))
< Authorization: Basic Y2xpZW50OnNlY3JldCUzRA==

we get different results.

The OAuth specification requires us to use "application/x-www-form-urlencoded" encoding algorithm, and it's very often a source for interoperability issues as @panva mentions.

panva commented 1 year ago

The correct js code that matches original appendix b behaviour to encode e.g. the client_id before passing it as username to basic auth is:

function formUrlEncode(token) {
  return encodeURIComponent(token).replace(/(?:[-_.!~*'()]|%20)/g, (substring) => {
    switch (substring) {
      case '-':
      case '_':
      case '.':
      case '!':
      case '~':
      case '*':
      case "'":
      case '(':
      case ')':
        return `%${substring.charCodeAt(0).toString(16).toUpperCase()}`
      case '%20':
        return '+'
      default:
        throw new Error()
    }
  })
}
shartte commented 1 year ago

Does OAuth do something different than what's describe in the HTTP Basic Authentication Scheme spec?

https://datatracker.ietf.org/doc/html/rfc7617#section-2

I believe the important bit is that application/x-www-form-urlencoded specifies UTF-8 encoding, while basic authentication doesn't really say which encoding is to be used (causing interop issues when client secrets contain any non-ASCII characters). by saying "use application/x-www-form-urlencoded" to encode the client-id and client-secret before using them as username/password, it is ensured that the basic auth username and password are ASCII only.

panva commented 9 months ago

This continues to be a problematic point in the implementations. Both client and server implementations sometimes just rely on Basic Authentication Scheme plugins completely, or partially disregarding the requirement to encode the username/password tokens using application/x-www-form-urlencoded. This makes interop for client implementations particularly difficult.

I've encountered clients that

I've encountered servers that

The biggest offender here is Google who uses - and . in their clientid values but fails to authenticate clients that properly encode those characters. Because of this, it's been the case for years that the most interoperable client implementation was to encode every non-alphanumeric character except `- . ! ~ * ' ( )`, which doesn't follow the definition.

cc @jogu This issue is even present in the Google Federated Identity listing on https://openid.net/certification where the test suite back then did not properly encode. I hope that if the suite ran against google's implementation today the certification would fail.

What a mess.

I'd like this issue to be discussed in future OAuth WG meetings.

jogu commented 9 months ago

I think we should discuss restricting the client_id and client_secret syntax to alphanumerics only to avoid this mess completely.

I'd be very much in favour of something along these lines. The experience of the OIDF certification team very much matches Filip's experience - any time values end up with characters in them that require encoding the number of interoperability problems sky rockets, and often in ways that are complex and time consuming to resolve (as these kind of encodings are often happening automatically in libraries and not always in obvious ways). It really doesn't seem like a good use of any developer's time to try and make these things work correctly when there's really no great advantage to using the characters that do require encoding.

(This is perhaps also a more general problem, e.g. similar interoperability issues arise if a client actually tries to use a state value that uses the current full set of allowed characters in state as per https://www.rfc-editor.org/rfc/rfc6749#appendix-A.5 )

selfissued commented 6 months ago

OpenID Federation Client IDs are HTTPS URLs. Some OpenID4VC Client IDs are as well. We need to keep the full character set.

aaronpk commented 6 months ago

So interestingly, it isn't possible to use an HTTPS URL as a userid in HTTP Basic Auth, since HTTP Basic Auth defines userid as *<TEXT excluding ":">: https://datatracker.ietf.org/doc/html/rfc2617#section-2

Updated reference: https://datatracker.ietf.org/doc/html/rfc7617#section-2

panva commented 6 months ago

I'm very glad this was discussed at IETF 119. Unfortunately it doesn't seem like any changes are likely to be acceptable.

This leaves client and server implementers in a tough spot where neither one can be compliant and/or strict about encoding/decoding the username/password tokens when interoperability is a goal because as seen - this current definition is already not followed.

What guidance should we give to implementers, primarily clients, to be interoperable then? Don't use client_secret_basic, use client_secret_post instead? That's explicitly not meant to be the default and is defined as NOT RECOMMENDED.

Including the client credentials in the request content using the two parameters is NOT RECOMMENDED and SHOULD be limited to clients unable to directly utilize the HTTP Basic authentication scheme (or other password-based HTTP authentication schemes).

aaronpk commented 6 months ago

I suppose the other way to go is to open it up more rather than restrict it, for example saying that only the : character in a client ID or secret is encoded, but all other characters are left alone. That would seem to be allowed according to RFC7617.

While this wouldn't automatically make everything interoperable, it would at least give people a faster path to interoperability for new implementations. Existing implementations that do some mix of pre-encoding the client ID/secret could be described as essentially defining their own encoding of the client ID/secret that the client developer would have to know by reading the docs, just like they already have to read the docs to know what scopes to use.

panva commented 6 months ago

@aaronpk unfortunately this would also not be compatible, in this case with existing implementations, which would continue to decode the unencoded resulting in malformed data and failed client authentications.

aaronpk commented 6 months ago

Right, but as was mentioned in the meeting, changing the spec also won't necessarily make existing implementations interoperable either. So my suggestion is more about making the spec reflect the reality of the variety of implementations instead.

shartte commented 6 months ago

I suppose the other way to go is to open it up more rather than restrict it, for example saying that only the : character in a client ID or secret is encoded, but all other characters are left alone. That would seem to be allowed according to RFC7617.

While this wouldn't automatically make everything interoperable, it would at least give people a faster path to interoperability for new implementations. Existing implementations that do some mix of pre-encoding the client ID/secret could be described as essentially defining their own encoding of the client ID/secret that the client developer would have to know by reading the docs, just like they already have to read the docs to know what scopes to use.

This does not help IT departments that require "special characters" in generated passwords (please don't ask), which are non-ASCII (i.e. §). If clients encode the client secret as per the spec, the character set is well defined as UTF-8. If they don't encode it, it isn't.

I'd personally say the reasoning for the encoding is clear. Making it known to implementers that they have to take care here is the key problem. :-(

panva commented 6 months ago

This does not help IT departments that require "special characters" in generated passwords (please don't ask), which are non-ASCII (i.e. §). If clients encode the client secret as per the spec, the character set is well defined as UTF-8. If they don't encode it, it isn't.

Non-ASCII characters are outside of the ABNF definition for client_id and client_secret in the first place.

I'd personally say the reasoning for the encoding is clear. Making it known to implementers that they have to take care here is the key problem. :-(

I agree with this.


Putting my server developer hat on. I can deal with plenty of non-conformities coming from the clients on the server side.

Putting my client developer hat on. I cannot deal with the different server implementations when they're not flexible. I would like to default to client_secret_post instead of client_secret_basic because of this issue but the fact that it's, a) a MAY support and b) as per spec, NOT RECOMMENDED, stops me from doing so. If that wasn't the case I'd be happy with the current character set and encoding rules, I could implement them properly in the client software and know that by default POST params are used that work with likely every server implementation.

jogu commented 6 months ago

Sadly I couldn't join the meeting last night because of timezones, but I watched the recording at least.

A couple of observations:

The Openbanking use cases that were mentioned that use urls as client ids don't use client secrets. I believe most federation use cases don't allow client secrets either. But I do agree with the conclusion that there's (unfortunately) not much we can/should do to restrict the client_id character set in general.

I believe the problem only applies to client_secret_basic. I'm not sure how many of the problem authorization servers also support client_secret_post, but it seems google at least does.

I agree with Aaron's suggestion of making the reality clearer, and Filip's suggestion of adding examples that cover this case.

I think we could also encourage people to use client_secret_post instead of basic. (Although OAuth2.1 already recommends using asymmetric client authentication so already recommends against all forms of client secrets...)

jogu commented 6 months ago

@panva (I didn't see your comment before posting mine)

I would like to default to client_secret_post instead of client_secret_basic because of this issue but the fact that it's, as per spec, NOT RECOMMENDED, stops me from doing so.

I don't think I was aware of that - where abouts is that recommendation?

panva commented 6 months ago

I don't think I was aware of that - where abouts is that recommendation?

@jogu

OAuth 2.0 section 2.3.1 OAuth 2.1 Draft 10 section 2.4.1

jogu commented 6 months ago

Thanks Filip. That's a pain. I guess my suggestion is useless then.

aaronpk commented 4 months ago

As discussed in the May 14 interim:

Resolved to: