Open btimby opened 7 years ago
Thanks for this issue report!
If I pass an auth kwarg intended for the request() method, it is eradicated here.
It should not be: the (auth is None)
clause should prevent entering that block if a custom auth handler was provided.
As for issue 2, this was raised in #218 and has not since been resolved. I think we should fix it.
In general, though, I would be happy to merge a PR that resolves your problems 2, 3, and 4. :smile:
I missed #218 yes, the three providers I reference do auth via body not basic auth.
New propsed change is to add refresh_kwargs
to __init__()
. If refresh_kwargs
is present, basic auth will be skipped, giving caller control over auth method for refresh tokens. We can also fold the refresh_token_url into this new dict. This fixes 2, 3 & 4. So my case is handled like so:
self.oauthsession = OAuth2Session(..., refresh_kwargs={'url': refresh_url,
'client_id': client_id, 'client_secret': client_secret'})
...
return self.oauthsession.request(method, url, headers=headers, **kwargs)
Yeah, I think that is reasonable. That, or we'd want to use a separate function to register those: a function you'd call after init. Either would be appropriate.
There is actually already auto_refresh_kwargs
accepted by __init__()
. But auth is generated anyway.
Not tested (yet):
https://github.com/btimby/requests-oauthlib/commit/cf3ff0a570c19768dc66de1a19f80e24976abce2
Will make PR after I test.
And actually, in reference to Issue 1 above, there is still a problem with the auth
kwarg. If I pass it to customize my authentication for token refresh and the tokens is NOT expired, then the kwarg is not popped, and is passed to request()
. On the other hand, if the token is expired, the kwarg is popped and NOT passed to request()
.
The API around this is really complicated, since refresh_token() args can come from kwargs to refresh()
or auto_refresh_kwargs
. OR auth is auto-generated in certain cases.
To simplify this would be a breaking change, it would require relying solely on auto_refresh_kwargs, so the caller has explicit control over the authentication. I think it is probably what is wanted since providers are free to implement their own methods...
Thoughts?
The spec leaves authentication method up to the implementer.
6. Refreshing an Access Token
If the authorization server issued a refresh token to the client, the
client makes a refresh request to the token endpoint by adding the
following parameters using the "application/x-www-form-urlencoded"
format per Appendix B with a character encoding of UTF-8 in the HTTP
request entity-body:
grant_type
REQUIRED. Value MUST be set to "refresh_token".
refresh_token
REQUIRED. The refresh token issued to the client.
scope
OPTIONAL. The scope of the access request as described by
Section 3.3. The requested scope MUST NOT include any scope
not originally granted by the resource owner, and if omitted is
treated as equal to the scope originally granted by the
resource owner.
Because refresh tokens are typically long-lasting credentials used to
request additional access tokens, the refresh token is bound to the
client to which it was issued. If the client type is confidential or
the client was issued client credentials (or assigned other
authentication requirements), the client MUST authenticate with the
authorization server as described in Section 3.2.1.
For example, the client makes the following HTTP request using
transport-layer security (with extra line breaks for display purposes
only):
POST /token HTTP/1.1
Host: server.example.com
Authorization: Basic czZCaGRSa3F0MzpnWDFmQmF0M2JW
Content-Type: application/x-www-form-urlencoded
grant_type=refresh_token&refresh_token=tGzv3JOkF0XG5Qx2TlKWIA
The authorization server MUST:
o require client authentication for confidential clients or for any
client that was issued client credentials (or with other
authentication requirements),
o authenticate the client if client authentication is included and
ensure that the refresh token was issued to the authenticated
client, and
o validate the refresh token.
If valid and authorized, the authorization server issues an access
token as described in Section 5.1. If the request failed
verification or is invalid, the authorization server returns an error
response as described in Section 5.2.
The authorization server MAY issue a new refresh token, in which case
the client MUST discard the old refresh token and replace it with the
new refresh token. The authorization server MAY revoke the old
refresh token after issuing a new refresh token to the client. If a
new refresh token is issued, the refresh token scope MUST be
identical to that of the refresh token included by the client in the
request.
While the example shows HTTP basic authentication, there is a reference to section 3.2.1, which covers authentication for the token endpoint as a whole. It states there MUST be authentication for confidential clients, but does not specifically lay out the scheme required.
3.2.1. Client Authentication
Confidential clients or other clients issued client credentials MUST
authenticate with the authorization server as described in
Section 2.3 when making requests to the token endpoint. Client
authentication is used for:
o Enforcing the binding of refresh tokens and authorization codes to
the client they were issued to. Client authentication is critical
when an authorization code is transmitted to the redirection
endpoint over an insecure channel or when the redirection URI has
not been registered in full.
o Recovering from a compromised client by disabling the client or
changing its credentials, thus preventing an attacker from abusing
stolen refresh tokens. Changing a single set of client
credentials is significantly faster than revoking an entire set of
refresh tokens.
o Implementing authentication management best practices, which
require periodic credential rotation. Rotation of an entire set
of refresh tokens can be challenging, while rotation of a single
set of client credentials is significantly easier.
A client MAY use the "client_id" request parameter to identify itself
when sending requests to the token endpoint. In the
"authorization_code" "grant_type" request to the token endpoint, an
unauthenticated client MUST send its "client_id" to prevent itself
from inadvertently accepting a code intended for a client with a
different "client_id". This protects the client from substitution of
the authentication code. (It provides no additional security for the
protected resource.)
Some grant types for example "Resource Owner Password Grant" would obviously use HTTP basic auth, others would not. There are OAuth2 profiles for using JWT, SAML etc. for performing authentication.
So it is my understanding that there is nothing special about HTTP basic authentication, in fact, I don't use it at all. Letting the caller control this completely seems like the right API. The caller can include the auth
key to auto_refresh_kwarg
if that is the intended method, or include other options otherwise.
Here is a breaking change that makes the API explicit. Caller can call refresh_token()
and pass their arguments explicitly (as always). Caller also explicitly controls same arguments when refresh_token()
is called automatically.
https://github.com/btimby/requests-oauthlib/commit/f22426b9318504317f768e272c57da2382bcf1a0
What do you think of this?
This ambiguity is part of why OAuth2 is a well-loathed specification: it essentially doesn't mandate anything, but is littered with MAYs that grant an astonishing amount of freedom to implementers to do whatever the hell they want. This makes it very difficult to write a single library that has meaningfully helpful automation across a number of OAuth2 providers.
That said, the general direction of this change seems reasonable, though I'll want to see a PR before I can really commit to saying that it's the right way to go. :smile:
I understand. This is why I propose just letting the caller decide. Although my experience so far is only with a handful of providers, my code to deal with them is fairly complex.
It seems from the current issues that refreshing tokens is a pain point for a bunch of users, and giving them explicit control would solve those problems. What I don't know however is how many folks would be affected if this "auto authentication" were to go away...
I will finalize and test my changes and submit a PR.
And by the way, I agree, I wish the spec were more explicit.
If you want to have secure application, then never keep your secrets in the memory due to the heart bleed attacks. Hence, the OAuth2Session
should not keep the client_secret
in the memory. The solution are:
get_client_secret
function as the __init__
parameter, so the session would call get_client_secret()
to get the secret.client_secret
parameter which may be function or string. And the doc would strongly discourage to pass the string directly. Then it's up to the user to decide if she wants (needs) secure or insecure version.@xmedeko Heartbleed is not a sensible thread model in this case. In particular, if get_client_secret()
returns the client secret as a byte string, then that byte string is subject to garbage collection. Python provides no low-level control of memory to securely erase byte strings, so that data may remain in memory indefinitely after use.
More generally, the only way to truly avoid heartbleed style attacks is to keep secret material outside of the process in question altogether. For OAuth2, given that the client_secret
is sent to the remote peer, the only way to do that is to have the TLS terminating process entirely outside of the process that actually builds the HTTP request. Essentially, none of the Python HTTP client libraries in use today are suitable against such a threat model.
The best defense against Heartbleed is to upgrade your OpenSSL.
@Lukasa thanks for the clarification. I didn't realize the problem with the Python memory management.
Yeah, it's a real problem. It's why, ultimately, most "managed-memory languages" (see: Python, Go, Ruby, etc.) are bad choices for writing any code that handles keying material in cryptography. Those almost always need to be written in languages that do allow low-level control of memory, so that keys can be securely evacuated when they aren't needed.
Possibly related -- there are numerous issues related to auto refresh auth on this tracker, it seems -- i'm also trying to get Basic auth passed through when auto refresh is called.
It seems both custom 'headers' and 'auth' are eaten from auto_refresh_kwargs by the 'prepare_refresh_body' (refresh_token line 285) , so they don't get through to the actual post. (refresh_token line 297). In fact, if i set 'headers' in auto_refresh_kwargs, the custom headers end up in the data section of the request body..
Just tried out PR# 265 and everything worked perfectly - auth went right through and worked - so, thumbs up!
Recently playing with our internal OAuth provider and inspecting the logs I noticed the following message: Prepared refresh token request body grant_type=refresh_token&scope=0-0-0-0-0&refresh_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX &allow_redirects=True
. The last allow_redirects=True
pair is included because requests.sessions.Session#get()
sets it on by default, and, as it was discussed above, all extra **kwargs
of the request()
method are passed to refresh_token()
. As a result, basically any keyword argument of get()
, post()
, etc., like params
or headers
, besides those explicitly declared in refresh_token()
is included in the body of auto refresh request. I agree that having a dedicated a parameter that allows to specify which extra arguments should be passed to refresh_token()
would be a proper general-case solution. But for the time being, at least not including arbitrary parameters of request methods in the body of auto refresh seems like a reasonable workaround (#277).
Just wanted to chime in with a :+1: for issue 3 in the OP. The variable name, auto_refresh_kwargs
, implies that these kwargs are the refresh kwargs, not extra kwargs. We were recently burned by this unexpected- and missing in the docs- behavior when the API we were authenticating against broke backward compatibility and started throwing errors for unknown parameters (i.e., the ones intended for the data API endpoint that are inexplicably passed through to the OAuth2 token endpoint during auto-refresh).
I think I must be missing something here... why are they passed through at all? (Is anyone aware of a valid use case?) It seems like this is the wrong behavior for any API where the data endpoints you're attempting to access have parameters that the OAuth2 token endpoint doesn't (which is nearly every API endpoint I've used...)
First of all, thank you for this wonderful library. It works great and saved me tons of time.
I am having issues with one aspect: auto token refresh. I have identified 3 distinct problems, which I will describe below and for which I will suggest solutions. I will also reference this issue from multiple other related issues. I am happy to provide a PR once we can agree upon a suitable solution.
I derived my solution from the following, as my issue is a variant of the same issue:
https://github.com/requests/requests-oauthlib/issues/260
All of the problems start here:
https://github.com/requests/requests-oauthlib/blob/master/requests_oauthlib/oauth2_session.py#L338
Issue 1
If I pass an
auth
kwarg intended for therequest()
method, it is eradicated here. I am not sure if this is an issue in practice.Issue 2
Box, Google and Onedrive do not expect the
client_id
andclient_secret
to be provided via Basic authentication alaAuthorization
header. They instead expect these POSTed in the body. There is no way to customize this behavior. Also, I don't know what vendors expect what method here, I only know the 3 above as that is what I am working with.Issue 3
Any kwargs I pass which are intended for the
request()
method end up being passed torefresh_token()
and are used when accessing the token endpoint. For example, if my originalrequest()
uploads a file, that file will be transferred to the token endpoint when attempting to refresh the token.Issue 4
client_id
andclient_secret
are not normally passed torequest()
but are necessary for token refresh. Passing them is not a problem except for unintended consequences of them being passed along tosuper().request()
Proposed solution.
I think all of the above can be solved by making two changes.
Pass the
client_id
andclient_secret
to__init__()
where they are stored as attributes (ofOAuth2Session
or underlyingWebApplicationClient
. In factclient_id
is already handled this way, butfetch_token()
,refresh_token()
etc. do no utilize it and expect it passed again as a kwarg. This addresses issue 4.Add a
refresh_kwargs
argument that accepts a dict, and provides kwargs passed along totoken_refresh()
thus keeping them separate from the regular kwargs intended forrequest()
. This solves issues 1 & 3.Place the
client_id
andclient_secret
into the POST body by default, all three of the providers I have tried expect this instead of basic auth. Callers can userefresh_kwargs
to provide anauth
kwarg if necessary (their provider expects this instead). Optionally, ifauth
is passed, theclient_id
andclient_secret
could be omitted from the POST body (it is unlikely that a provider expects both). This solves issue 3.I present my current workaround, which works perfectly for my providers.