requests / requests-oauthlib

OAuthlib support for Python-Requests!
https://requests-oauthlib.readthedocs.org/
ISC License
1.73k stars 423 forks source link

Problems with Token refreshing. #264

Open btimby opened 7 years ago

btimby commented 7 years ago

First of all, thank you for this wonderful library. It works great and saved me tons of time.

I am having issues with one aspect: auto token refresh. I have identified 3 distinct problems, which I will describe below and for which I will suggest solutions. I will also reference this issue from multiple other related issues. I am happy to provide a PR once we can agree upon a suitable solution.

I derived my solution from the following, as my issue is a variant of the same issue:

https://github.com/requests/requests-oauthlib/issues/260

All of the problems start here:

https://github.com/requests/requests-oauthlib/blob/master/requests_oauthlib/oauth2_session.py#L338

    auth = kwargs.pop('auth', None)
    if client_id and client_secret and (auth is None):
        log.debug('Encoding client_id "%s" with client_secret as Basic auth credentials.', client_id)
        auth = requests.auth.HTTPBasicAuth(client_id, client_secret)
    token = self.refresh_token(
        self.auto_refresh_url, auth=auth, **kwargs)

Issue 1

If I pass an auth kwarg intended for the request() method, it is eradicated here. I am not sure if this is an issue in practice.

Issue 2

Box, Google and Onedrive do not expect the client_id and client_secret to be provided via Basic authentication ala Authorization header. They instead expect these POSTed in the body. There is no way to customize this behavior. Also, I don't know what vendors expect what method here, I only know the 3 above as that is what I am working with.

Issue 3

Any kwargs I pass which are intended for the request() method end up being passed to refresh_token() and are used when accessing the token endpoint. For example, if my original request() uploads a file, that file will be transferred to the token endpoint when attempting to refresh the token.

Issue 4

client_id and client_secret are not normally passed to request() but are necessary for token refresh. Passing them is not a problem except for unintended consequences of them being passed along to super().request()

Proposed solution.

I think all of the above can be solved by making two changes.

  1. Pass the client_id and client_secret to __init__() where they are stored as attributes (of OAuth2Session or underlying WebApplicationClient. In fact client_id is already handled this way, but fetch_token(), refresh_token() etc. do no utilize it and expect it passed again as a kwarg. This addresses issue 4.

  2. Add a refresh_kwargs argument that accepts a dict, and provides kwargs passed along to token_refresh() thus keeping them separate from the regular kwargs intended for request(). This solves issues 1 & 3.

  3. Place the client_id and client_secret into the POST body by default, all three of the providers I have tried expect this instead of basic auth. Callers can use refresh_kwargs to provide an auth kwarg if necessary (their provider expects this instead). Optionally, if auth is passed, the client_id and client_secret could be omitted from the POST body (it is unlikely that a provider expects both). This solves issue 3.

I present my current workaround, which works perfectly for my providers.


    def __init__(self):
        # Omit token refresh args, we do this manually
        self.oauthsession = OAuth2Session(token=token, **kwargs)

    def request(self, method, url, chunk, headers={}, **kwargs):
        """
        Perform HTTP request for OAuth.
        """
        while True:
            try:
                return self.oauthsession.request(method, url, headers=headers,
                                                 **kwargs)
            except TokenExpiredError:
                # Do our own, since requests_oauthlib is broken.
                token = self.oauthsession.refresh_token(
                    self.REFRESH_TOKEN_URL,
                    refresh_token=self.oauth_access.refresh_token,
                    client_id=self.provider.client_id,
                    client_secret=self.provider.client_secret)
                self._refresh_token_callback(token)
                continue
Lukasa commented 7 years ago

Thanks for this issue report!

If I pass an auth kwarg intended for the request() method, it is eradicated here.

It should not be: the (auth is None) clause should prevent entering that block if a custom auth handler was provided.

As for issue 2, this was raised in #218 and has not since been resolved. I think we should fix it.

In general, though, I would be happy to merge a PR that resolves your problems 2, 3, and 4. :smile:

btimby commented 7 years ago

I missed #218 yes, the three providers I reference do auth via body not basic auth.

New propsed change is to add refresh_kwargs to __init__(). If refresh_kwargs is present, basic auth will be skipped, giving caller control over auth method for refresh tokens. We can also fold the refresh_token_url into this new dict. This fixes 2, 3 & 4. So my case is handled like so:

    self.oauthsession = OAuth2Session(..., refresh_kwargs={'url': refresh_url,
        'client_id': client_id, 'client_secret': client_secret'})
    ...
    return self.oauthsession.request(method, url, headers=headers, **kwargs)
Lukasa commented 7 years ago

Yeah, I think that is reasonable. That, or we'd want to use a separate function to register those: a function you'd call after init. Either would be appropriate.

btimby commented 7 years ago

There is actually already auto_refresh_kwargs accepted by __init__(). But auth is generated anyway.

btimby commented 7 years ago

Not tested (yet):

https://github.com/btimby/requests-oauthlib/commit/cf3ff0a570c19768dc66de1a19f80e24976abce2

Will make PR after I test.

btimby commented 7 years ago

And actually, in reference to Issue 1 above, there is still a problem with the auth kwarg. If I pass it to customize my authentication for token refresh and the tokens is NOT expired, then the kwarg is not popped, and is passed to request(). On the other hand, if the token is expired, the kwarg is popped and NOT passed to request().

The API around this is really complicated, since refresh_token() args can come from kwargs to refresh() or auto_refresh_kwargs. OR auth is auto-generated in certain cases.

To simplify this would be a breaking change, it would require relying solely on auto_refresh_kwargs, so the caller has explicit control over the authentication. I think it is probably what is wanted since providers are free to implement their own methods...

Thoughts?

btimby commented 7 years ago

The spec leaves authentication method up to the implementer.

6.  Refreshing an Access Token

   If the authorization server issued a refresh token to the client, the
   client makes a refresh request to the token endpoint by adding the
   following parameters using the "application/x-www-form-urlencoded"
   format per Appendix B with a character encoding of UTF-8 in the HTTP
   request entity-body:

   grant_type
         REQUIRED.  Value MUST be set to "refresh_token".

   refresh_token
         REQUIRED.  The refresh token issued to the client.

   scope
         OPTIONAL.  The scope of the access request as described by
         Section 3.3.  The requested scope MUST NOT include any scope
         not originally granted by the resource owner, and if omitted is
         treated as equal to the scope originally granted by the
         resource owner.

   Because refresh tokens are typically long-lasting credentials used to
   request additional access tokens, the refresh token is bound to the
   client to which it was issued.  If the client type is confidential or
   the client was issued client credentials (or assigned other
   authentication requirements), the client MUST authenticate with the
   authorization server as described in Section 3.2.1.

   For example, the client makes the following HTTP request using
   transport-layer security (with extra line breaks for display purposes
   only):

     POST /token HTTP/1.1
     Host: server.example.com
     Authorization: Basic czZCaGRSa3F0MzpnWDFmQmF0M2JW
     Content-Type: application/x-www-form-urlencoded

     grant_type=refresh_token&refresh_token=tGzv3JOkF0XG5Qx2TlKWIA

   The authorization server MUST:

   o  require client authentication for confidential clients or for any
      client that was issued client credentials (or with other
      authentication requirements),

   o  authenticate the client if client authentication is included and
      ensure that the refresh token was issued to the authenticated
      client, and

   o  validate the refresh token.

   If valid and authorized, the authorization server issues an access
   token as described in Section 5.1.  If the request failed
   verification or is invalid, the authorization server returns an error
   response as described in Section 5.2.

   The authorization server MAY issue a new refresh token, in which case
   the client MUST discard the old refresh token and replace it with the
   new refresh token.  The authorization server MAY revoke the old
   refresh token after issuing a new refresh token to the client.  If a
   new refresh token is issued, the refresh token scope MUST be
   identical to that of the refresh token included by the client in the
   request.

While the example shows HTTP basic authentication, there is a reference to section 3.2.1, which covers authentication for the token endpoint as a whole. It states there MUST be authentication for confidential clients, but does not specifically lay out the scheme required.

3.2.1.  Client Authentication

   Confidential clients or other clients issued client credentials MUST
   authenticate with the authorization server as described in
   Section 2.3 when making requests to the token endpoint.  Client
   authentication is used for:

   o  Enforcing the binding of refresh tokens and authorization codes to
      the client they were issued to.  Client authentication is critical
      when an authorization code is transmitted to the redirection
      endpoint over an insecure channel or when the redirection URI has
      not been registered in full.

   o  Recovering from a compromised client by disabling the client or
      changing its credentials, thus preventing an attacker from abusing
      stolen refresh tokens.  Changing a single set of client
      credentials is significantly faster than revoking an entire set of
      refresh tokens.

   o  Implementing authentication management best practices, which
      require periodic credential rotation.  Rotation of an entire set
      of refresh tokens can be challenging, while rotation of a single
      set of client credentials is significantly easier.

   A client MAY use the "client_id" request parameter to identify itself
   when sending requests to the token endpoint.  In the
   "authorization_code" "grant_type" request to the token endpoint, an
   unauthenticated client MUST send its "client_id" to prevent itself
   from inadvertently accepting a code intended for a client with a
   different "client_id".  This protects the client from substitution of
   the authentication code.  (It provides no additional security for the
   protected resource.)

Some grant types for example "Resource Owner Password Grant" would obviously use HTTP basic auth, others would not. There are OAuth2 profiles for using JWT, SAML etc. for performing authentication.

So it is my understanding that there is nothing special about HTTP basic authentication, in fact, I don't use it at all. Letting the caller control this completely seems like the right API. The caller can include the auth key to auto_refresh_kwarg if that is the intended method, or include other options otherwise.

btimby commented 7 years ago

Here is a breaking change that makes the API explicit. Caller can call refresh_token() and pass their arguments explicitly (as always). Caller also explicitly controls same arguments when refresh_token() is called automatically.

https://github.com/btimby/requests-oauthlib/commit/f22426b9318504317f768e272c57da2382bcf1a0

What do you think of this?

Lukasa commented 7 years ago

This ambiguity is part of why OAuth2 is a well-loathed specification: it essentially doesn't mandate anything, but is littered with MAYs that grant an astonishing amount of freedom to implementers to do whatever the hell they want. This makes it very difficult to write a single library that has meaningfully helpful automation across a number of OAuth2 providers.

That said, the general direction of this change seems reasonable, though I'll want to see a PR before I can really commit to saying that it's the right way to go. :smile:

btimby commented 7 years ago

I understand. This is why I propose just letting the caller decide. Although my experience so far is only with a handful of providers, my code to deal with them is fairly complex.

It seems from the current issues that refreshing tokens is a pain point for a bunch of users, and giving them explicit control would solve those problems. What I don't know however is how many folks would be affected if this "auto authentication" were to go away...

I will finalize and test my changes and submit a PR.

btimby commented 7 years ago

And by the way, I agree, I wish the spec were more explicit.

btimby commented 7 years ago

PR opened: https://github.com/requests/requests-oauthlib/pull/265

xmedeko commented 7 years ago

If you want to have secure application, then never keep your secrets in the memory due to the heart bleed attacks. Hence, the OAuth2Session should not keep the client_secret in the memory. The solution are:

  1. Secure only: To pass the get_client_secret function as the __init__ parameter, so the session would call get_client_secret() to get the secret.
  2. Secure and insecure: To have a client_secret parameter which may be function or string. And the doc would strongly discourage to pass the string directly. Then it's up to the user to decide if she wants (needs) secure or insecure version.
Lukasa commented 7 years ago

@xmedeko Heartbleed is not a sensible thread model in this case. In particular, if get_client_secret() returns the client secret as a byte string, then that byte string is subject to garbage collection. Python provides no low-level control of memory to securely erase byte strings, so that data may remain in memory indefinitely after use.

More generally, the only way to truly avoid heartbleed style attacks is to keep secret material outside of the process in question altogether. For OAuth2, given that the client_secret is sent to the remote peer, the only way to do that is to have the TLS terminating process entirely outside of the process that actually builds the HTTP request. Essentially, none of the Python HTTP client libraries in use today are suitable against such a threat model.

The best defense against Heartbleed is to upgrade your OpenSSL.

xmedeko commented 7 years ago

@Lukasa thanks for the clarification. I didn't realize the problem with the Python memory management.

Lukasa commented 7 years ago

Yeah, it's a real problem. It's why, ultimately, most "managed-memory languages" (see: Python, Go, Ruby, etc.) are bad choices for writing any code that handles keying material in cryptography. Those almost always need to be written in languages that do allow low-level control of memory, so that keys can be securely evacuated when they aren't needed.

benh57 commented 7 years ago

Possibly related -- there are numerous issues related to auto refresh auth on this tracker, it seems -- i'm also trying to get Basic auth passed through when auto refresh is called.

It seems both custom 'headers' and 'auth' are eaten from auto_refresh_kwargs by the 'prepare_refresh_body' (refresh_token line 285) , so they don't get through to the actual post. (refresh_token line 297). In fact, if i set 'headers' in auto_refresh_kwargs, the custom headers end up in the data section of the request body..

Just tried out PR# 265 and everything worked perfectly - auth went right through and worked - so, thumbs up!

east825 commented 7 years ago

Recently playing with our internal OAuth provider and inspecting the logs I noticed the following message: Prepared refresh token request body grant_type=refresh_token&scope=0-0-0-0-0&refresh_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX &allow_redirects=True. The last allow_redirects=True pair is included because requests.sessions.Session#get() sets it on by default, and, as it was discussed above, all extra **kwargs of the request() method are passed to refresh_token(). As a result, basically any keyword argument of get(), post(), etc., like params or headers, besides those explicitly declared in refresh_token() is included in the body of auto refresh request. I agree that having a dedicated a parameter that allows to specify which extra arguments should be passed to refresh_token() would be a proper general-case solution. But for the time being, at least not including arbitrary parameters of request methods in the body of auto refresh seems like a reasonable workaround (#277).

justin-f-perez commented 2 years ago

Just wanted to chime in with a :+1: for issue 3 in the OP. The variable name, auto_refresh_kwargs, implies that these kwargs are the refresh kwargs, not extra kwargs. We were recently burned by this unexpected- and missing in the docs- behavior when the API we were authenticating against broke backward compatibility and started throwing errors for unknown parameters (i.e., the ones intended for the data API endpoint that are inexplicably passed through to the OAuth2 token endpoint during auto-refresh).

I think I must be missing something here... why are they passed through at all? (Is anyone aware of a valid use case?) It seems like this is the wrong behavior for any API where the data endpoints you're attempting to access have parameters that the OAuth2 token endpoint doesn't (which is nearly every API endpoint I've used...)