googleapis / google-auth-library-python

Google Auth Python Library
https://googleapis.dev/python/google-auth/latest/
Apache License 2.0
771 stars 304 forks source link

Generated Access Tokens are flagged as expired prematurely #1449

Closed sanjain2004 closed 7 months ago

sanjain2004 commented 7 months ago

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

Steps to reproduce

  1. In my python code, I have this code to generate an access token and expiry
credentials, _ = auth.default()
auth_req = g_auth_req.Request()
credentials.refresh(auth_req)

I use the credentials.token and credentials.expiry and send it to my Java backend for testing permissions. I have ensured that the expiry time used to construct the Java AccessToken object is correct.

About 4-5 minutes before the actual expiry time, the testPermissions call to the backend always gives me this error: "java.lang.IllegalStateException: OAuth2Credentials instance does not support refreshing the access token. An instance with a new access token should be used, or a derived type that supports refreshing."

We assume that the token is considered expired. Upon regeneration of the token in the Python Cloud Function, we continue to get the same expired token with the same expiry (presumably because the token generator thinks the token has not expired). A new token is generated about 3.5 to 4 minutes before the actual expiry time.

So, there seems to be a premature expiration of token.

Google Support seems to think there is an issue in generating the token.

Thanks for any help!

clundin25 commented 7 months ago

If I understand correctly, the flow is as follows

graph TD;
    ADC-- Create Credentials -->python
    python-- Pass token and expiry -->java
    java-- Test Permissions with the token created by python --> Iam["Iam Endpoint"]

I believe Cloud Functions will cache tokens until ~4 minutes before expiration, and this is expected.

So, the issue is that the Iam service will start rejecting your request 4-5 minutes before the ACTUAL expiration of the Python token?

sanjain2004 commented 7 months ago

Thanks Carl for the quick reply.

The flow you described is correct. However, I am not following the "I believe Cloud Functions will cache tokens until ~4 minutes before expiration, and this is expected.". So let's say the expiry is at 11:30:00. Here are my observations:

At approximately, around 11:25:00, the Iam Endpoint that tests permissions, starts rejecting the token. Upon retries, a new token in Python is not generated until around 11:26:00 or a few seconds more or less (even when it is asked to generate).

Can you tell me if this is expected? And if so, where is it documented?

Thanks

clundin25 commented 7 months ago

However, I am not following the

The MDS server that the client is retrieving tokens from will return the same token until it is 4 minutes from expiration, at which point it will be refreshed and a new token will appear.

I think it is odd that the IamEndpoint rejected your request. This snippet worked for me

ACCESS_TOKEN=$(gcloud auth print-access-token) && sleep 55m && curl -X POST -H "Authorization: Bearer $ACCESS_TOKEN" -H "Content-Type: application/json; charset=utf-8" -d @request.json "https://cloudresourcemanager.googleapis.com/v1/projects/carl-debug-project:testIamPermissions"

I will try again with 57m and 61m, but this is on a GCE environment, so I likely will have to test this on a Cloud Functions environment.

sanjain2004 commented 7 months ago

What is MDS server? As noted in my original post, this is the error I get from the IamEndpoint: "java.lang.IllegalStateException: OAuth2Credentials instance does not support refreshing the access token. An instance with a new access token should be used, or a derived type that supports refreshing."

Can you point me to a documentation about the 4 minutes part?

Thanks

clundin25 commented 7 months ago

"MDS" refers to the metadata server that is hosted on the GCP environment. This is not publicly documented behavior.

clundin25 commented 7 months ago

Can you expand on what the Java code is doing? The expired token error I see from IAM does not match what you have shared, so maybe there is something going on in the Java code?

sanjain2004 commented 7 months ago

I got the code from: https://cloud.google.com/iam/docs/testing-permissions#iam-testing-permissions-java I can send you excerpts if this is insufficient.

sanjain2004 commented 7 months ago

Hi Carl, Were you able to find anything on this? Thanks for your help.

clundin25 commented 7 months ago

I will be able to work on this further the week of the 15th, since it requires setting up a new environment. I do not have bandwidth until that time.

sanjain2004 commented 7 months ago

Thanks. If you can point me in some other direction, I can try.

clundin25 commented 7 months ago

Perhaps you can see if a token produced by https://github.com/googleapis/google-auth-library-java works better? That removes python out of the equation

sanjain2004 commented 7 months ago

Unfortunately, that is not easy with the architecture we have. That's because the Java piece is a different app. So, I cannot generate a token in this app with a Service Account of a different app.

clundin25 commented 7 months ago

Hmm that seems problematic. Can you explain how your app works a little more?

sanjain2004 commented 7 months ago

Sure. We have a Java app (springboot) that provides persistence service. We have multiple cloud functions (in python) that run under their own service accounts and use the Java app for persistence. These SAs may or may not have read/write permissions. So, we generate an access token in the cloud function, pass it to the Java app, which checks the permissions using the access token and acts accordingly.

clundin25 commented 7 months ago

Okay, and is this a recent regression?

sanjain2004 commented 7 months ago

No, it's new implementation.

I found this "Access tokens expire after a short period of time. The metadata server caches access tokens until they have 5 minutes of remaining time before they expire. You can request new tokens as frequently as you like, but your applications must have a valid access token for their API calls to succeed." here: https://cloud.google.com/compute/docs/access/authenticate-workloads#:~:text=Access%20tokens%20expire%20after%20a,their%20API%20calls%20to%20succeed.

Here is a theory: The python api uses the MDS to generate the token. Per the documentation above, token expires 5 minutes before. So I start getting errors. But when a request for a new token is made, one of 2 things might be going on - 1. the python code is looking at the MDS that has not yet expired the token OR 2. you mentioned 4 minutes, which means the token has not expired for the python API. So it serves the same token for another minute. Then everything resolves after that.

I am curious about the 4 minutes that you mentioned before. If there is some documentation on the python side about that, it might be useful.

clundin25 commented 7 months ago

Hi @sanjain2004,

I believe the issue you are facing is that the Java object you are constructing does not support refresh. When the Java code attempts to refresh, it results in an illegal state and runtime exception.

You probably need extra logic to create a new object, as an access token created from the raw token and expiration lacks details needed to perform a refresh.

See the java code here, I imagine if you set a breakpoint here you will hit it 5 minutes before the token expires.

You can see that the Java code attempts to refresh the token 5 minutes before expiration due to this default.

sanjain2004 commented 7 months ago

Hi Carl, Even if it hits that and results in an error (5 minutes before expiration) because it cannot refresh, we handle it. Then the python code is supposed to generate a new token and everything should be fine. However, the inability of the python code to generate a new token is also a blocking factor. As mentioned before, the python code generates the same token for at least another minute.

This means somehow the access token expiration check in Java is done 5 minutes before. But the python side does not generate a new token until ~4 minutes before expiration causing this 1 minute of exception. But why the discrepancy between the python and java.

clundin25 commented 7 months ago

I don't think this is a very common use case, so it is likely a novel issue.

Can you set a shorter expiration, say 3 minutes in the Java builder here?

We try to keep the auth libraries in sync, but the issue here is that the Java default should be updated to a window of 3 minutes and 45 seconds, but this change has not yet happened.

sanjain2004 commented 7 months ago

Thanks for the pointer. I will try that.

Just so I understand the last statement - is it correct to say that the Java library and Python libraries are out-of-sync in terms of checking the token expiration? The Java library has it at 5 minutes, while python has it at 3m 45s (as you wrote above). And I am assuming that time is not publicly documented?

clundin25 commented 7 months ago

Yes, the java refresh window should MAX be 4 minutes, because serverless runtimes (CloudRun, etc.) will cache tokens until 4 minutes before expiration.

I've opened https://github.com/googleapis/google-auth-library-java/pull/1352 to make the adjustment.

sanjain2004 commented 7 months ago

So, I was trying what you said. This is my current code:

GoogleCredentials creds = GoogleCredentials.newBuilder().setAccessToken(accessToken).build();
GoogleCredentials credential = creds.createScoped(Collections.singleton(IamScopes.CLOUD_PLATFORM));
return  new CloudResourceManager.Builder(GoogleNetHttpTransport.newTrustedTransport(),
        GsonFactory.getDefaultInstance(), new HttpCredentialsAdapter(credential))
        .setApplicationName("bigtable-endpoints-service") // Any name will work
        .build();

I cannot inject the .setExpirationMargin() in the GoogleCredentials builder, since setExpirationMargin returns a Oauth2Builder.

sanjain2004 commented 7 months ago

To understand the flow now, Let's say the python code generates a token with expiry after 30 minutes. With the changes in the timings that you are making, 3m 45s before expiration (due to refresh token check), the IllegalStateException will be thrown in Java. At this time, the python code will generate a new token. Question - Will the python code be able to successfully generate a new token 3m 45s before within 1 or 2 tries?

OR

Will the Java code that is testing permissions, get an error 3m before the expiration (due to expiration check)?

Thanks for the quick turnaround.

clundin25 commented 7 months ago

The refresh is not guaranteed to succeed on it's first try, and has retries by default. I would recommend accounting for this scenario in your code, as this is a brittle interaction

sanjain2004 commented 7 months ago

We have retries built in the python code to try to generate the new access token. With your new timings, hopefully it will not take 30-60 seconds before a new one is generated.

clundin25 commented 7 months ago

Great! I will close this issue but feel free to re-open or create a new issue if you see any further issues. Thanks!

sanjain2004 commented 7 months ago

Hi Carl,

Currently, this is my entry in pom.xml

<dependency>
          <groupId>com.google.apis</groupId>
          <artifactId>google-api-services-cloudresourcemanager</artifactId>
          <version>v3-rev20230806-2.0.0</version>
</dependency>

This brings 1.16.0 version of google-auth-library-oauth2-http library.

How can I get the latest one - 1.23.0 (that you are merging) - which version of google-api-services-cloudresourcemanager would I have to go to?