googleapis / google-auth-library-python

Google Auth Python Library
https://googleapis.dev/python/google-auth/latest/
Apache License 2.0
774 stars 305 forks source link

Permission denied while getting drive credentials: ADC with impersonation #1204

Open adamcunnington-mlg opened 1 year ago

adamcunnington-mlg commented 1 year ago

I consider myself pretty familiar with the various google auth flows available via the python SDK - and how this interacts with gcloud-generated credentials.

We are using the bq SDK in the typical way; client = bigquery.Client() and we make use of ADC so our code is interoperable between dev and prod. Our code interacts with external tables that are sourced from sheets on google drive. We know that we need to provide the necessary scopes (and of course, permission to the underlying sheets).

The following works fine for a user identity with the necessary permissions: gcloud auth application-default login --scopes=https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform

However, the following does not: gcloud auth application-default login --scopes=https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform --impersonate-service-account=hand-of-god@mlg-apollo-data-prod.iam.gserviceaccount.com

We receive google.api_core.exceptions.Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.

I can replicate the same issue with my user credential if 1 of the following 2 things are true: 1) I don't pass google drive scopes. 2) I don't have access to the underlying file.

The service account that I am impersonating definitely has access to the file and I can see the BigQuery job failure with non-descript error message (a feature request has been raised for this with the BigQuery REST API team). My suspicion is that when impersonating a service account, the scopes (that are presumably buried in the credential) are not passed through / correctly read by the SDK (WHEN the ADC was generated using SA impersonation only). Maybe a similar issue is happening with my above note when the project cannot be inferred from the environment.

See below screenshot proof of correct permissions being in place: image

Very grateful for some direction here...

adamcunnington-mlg commented 1 year ago

For the next poor soul that encounters this, I have concluded that indeed, the scopes are ignored by the python SDK (might not be isolated to just here) relating to ADC credentials generated using service account impersonation.

Interestingly, the same issue does NOT happen when using googleapiclient (google-python-api-client) so I think that library does something smarter than google-cloud-core does.

This can be worked around in various ways by explicitly setting the project and scopes within the code but this makes for a brittle implementation that is not interoperable with different credential types and environments.

I found the best way to workaround this is by passing the lesser known client_options object (https://googleapis.dev/python/google-api-core/latest/client_options.html#google.api_core.client_options.ClientOptions) which supports explicit scopes

An alternative is to create an ADC object explicitly with scopes; e.g. google.auth.default(scopes=...)

adamcunnington-mlg commented 1 year ago

I've raised a case with Google Cloud support to confirm this bug

tswast commented 1 year ago

Thanks for raising this issue. I see you have already discovered the client_options and credentials via google.auth.default workarounds.

Some related code for further investigation. We set default scopes here:

https://github.com/googleapis/python-bigquery/blob/40e4da78bb690ff4c94832321377bb1590e2eeaf/google/cloud/bigquery/client.py#L210-L213

These scopes are used here:

https://github.com/googleapis/python-cloud-core/blob/8ca0faa17e87aa842d154b965be5ef39f1f7490d/google/cloud/client/__init__.py#L169

Potentially there's a difference between an impersonated service account and user credentials, where the former can be scoped down? I recall that user credentials aren't really affected by the scopes parameter after they're created.

tswast commented 1 year ago

Looking at https://github.com/googleapis/google-auth-library-python/blob/a83af399fe98764ee851997bf3078ec45a9b51c9/google/auth/credentials.py#L327 I think perhaps we should be setting default_scopes here https://github.com/googleapis/python-cloud-core/blob/8ca0faa17e87aa842d154b965be5ef39f1f7490d/google/cloud/client/__init__.py#L181 instead of scopes.

tswast commented 1 year ago

On second thought, this may not be a bug. I think no drive scope is the correct default, so clients that need these scopes should be passing it in via the client_options.

Perhaps we reclassify this as a documentation issue to update the code sample at https://cloud.google.com/bigquery/docs/external-data-drive#python now that client_options are available?

adamcunnington-mlg commented 1 year ago

Thanks so much for looking at this but I don't quite agree its a docs issue. The key point here is that the scopes are correctly extracted from ADC when ADC is of type authorized user but NOT when they are of type impersonated_service_account. I think this requires a fix in google.auth.

tswast commented 1 year ago

The key point here is that the scopes are correctly extracted from ADC when ADC is of type authorized user but NOT when they are of type impersonated_service_account

I suppose there's a subtlety here. We don't want to downscope credentials that already have cloud-platform or bigquery scopes. The only reason we're not doing that for authorized user is that downscoping isn't supported in google-auth. If it were supported, we wouldn't want to be downscoping in that case, either.

adamcunnington-mlg commented 1 year ago

But as far as google.auth is concerned, an authorised user or an authorised user that is impersonating a service account, is the same category of thing. It's still an authorised user credential, and when I'm generating ADC, I'm providing explicit scopes, which in this case are wider than what is coming from python-bigquery (cloud platform PLUS drive scopes) but bigquery is irrelevant in the discussion here - this issue should probably be ported to google.auth repo. It's not BQ specific at all.

rafael-guevara-ONE commented 1 year ago

Hello Dear I have the same issue, when I try to execute the query to consult table with external data source in my case google sheet, so finally appear the message error "Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials."

I share the Google sheet with the service account, but issue was not resolved yet

image

Somebody knows how to fix this issue?

krampepampe commented 1 year ago

Hi @rafael-guevara-ONE, I have exactly the same issue.

adamcunnington-mlg commented 1 year ago

@rafael-guevara-ONE @krampepampe I doubt you are having the same issue.

How are you authenticating? You are probably missing the google drive scope. That is not what this issue is about.

rafael-guevara-ONE commented 1 year ago

Hello @adamcunnington-mlg I really appreciate your answer thanks

I added the scopes in function in my case in I am using dialog flow == cloud function with node js and now it is working:

image

@krampepampe dear try to add the scope into your BigQuery service:, check the below links: https://stackoverflow.com/questions/68064592/bigquery-permission-denied-while-getting-drive-credentials-unable-to-resolve https://stackoverflow.com/questions/60903258/bigquery-nodejs-library-error-while-accessing-external-source-in-google-drive

adamcunnington-mlg commented 1 year ago

Ok, this is muddying the water of this issue. Thanks for that!

@tswast please can you advise following my previous response?

tswast commented 1 year ago

But as far as google.auth is concerned, an authorised user or an authorised user that is impersonating a service account, is the same category of thing. It's still an authorised user credential, and when I'm generating ADC, I'm providing explicit scopes, which in this case are wider than what is coming from python-bigquery (cloud platform PLUS drive scopes) but bigquery is irrelevant in the discussion here - this issue should probably be ported to google.auth repo. It's not BQ specific at all.

Oh, I agree that it's a subtley that shouldn't exist. I'm not 100% sure how google-auth would detect that it shouldn't try to downscope, but it's probably worth moving over to that repo.

Alternatively, the issue may be in google-cloud-core, because it isn't using the default_scopes argument for scopes that come from the client class definition. https://github.com/googleapis/python-cloud-core/blob/8ca0faa17e87aa842d154b965be5ef39f1f7490d/google/cloud/client/__init__.py#L181

One thing that will make this more difficult with respect to bigquery is that cloud-platform is a superset of the permissions in the bigquery scope. I've filed https://github.com/googleapis/python-bigquery/issues/1444 to standardize the scopes to avoid potential confusion.

adamcunnington-mlg commented 1 year ago

@tswast thanks for the response. Also, I raised a support case in Google Cloud - I'm not sure if you ended up inputting there but just to bring this altogether, here is the response I got there - which honestly, I'm a little dubious about. It centres around there not being a "scope" argument in the ADC JSON but there never is. Presumably the scope information is baked into the refresh token or something.

All the same, here it is: I got an update from the Product Team regarding your issue. when you are using gcloud command[1], it adds the scopes to the source credential instead of the impersonated creds, and the dumped ADC file doesn't have a scope field, so this info is lost when users load from the ADC file. Hence you’re facing this issue. I would like to inform you that we won't support scope + impersonate-service-account flags at the moment..

I also discussed with Eng team for possibility of supporting scope with impersonated service account and they replied as below:

Adding scopes + impersonate-service-account support in gcloud for this command is not in the current road map. This is a major effort so we don't think this will happen in a short time (we need to add scopes to the ADC file, but any ADC file change has a big impact: we not only need to update gcloud, but also auth libraries in every supported language).

However, the Product Team informed us that they will add a warning message for command [1] saying scopes will be ignored but there is no promised ETA at the moment.

mau21mau commented 1 year ago

Trying to use google.auth.default(scopes=['https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/cloud-platform']) doesn't work for me. For some reason, the scopes of the credentials will be ignored and the scopes will be set to None

EricSeastrand commented 3 months ago

For anyone else facing this, here's the exact code that worked for me:

from google.cloud import bigquery
client = bigquery.Client(client_options={
    "scopes": ['https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/cloud-platform']
})
results = client.query_and_wait(sql)

While I understand the nuance here (and how this might not be a "bug" per se), it certainly can be unexpected behavior that costs manhours. I think it's a deeper issue with how GCP handles permissions on the server side and can't really be addressed in the language SDKs (other than with one-off hacks as described here). My rationale: The authentication between BQ to GDrive happens on the server side. It should be up to the server side to look up the service account's permissions, see that it can access a GSheet-backed table, know that it's allowed to talk to Gdrive, and make the connection. The way it works now: the service account making the API call into BigQuery API needs to somehow "know" that a table is Gsheet-backed, so that it can include the right access scopes. That feels problematic to me. It's a leaky abstraction at best.

But it sounds like GCP isn't interested in addressing it. I guess I get it: that's a really big change with huge implications. So for now at least we have GoogleSearch to help the next poor dev find this GH issue and get past this odd behavior.