iterative / dvc

🦉 Data Versioning and ML Experiments
https://dvc.org
Apache License 2.0
13.87k stars 1.19k forks source link

DVC+Google Cloud: Could not automatically determine credentials (GOOGLE_APPLICATION_CREDENTIALS not set) #3005

Closed michaelitvin closed 3 years ago

michaelitvin commented 4 years ago

Trying to run dvc pull with a Google Cloud remote, got this error message:

ERROR: unexpected error - Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

gsutil ls gs://my_bucket worked fine.

Running gcloud auth login didn't help, but gcloud beta auth application-default login solved the problem.

$ echo ${GOOGLE_APPLICATION_CREDENTIALS}

$ dvc --version
0.77.3

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.3 LTS
Release:        18.04
Codename:       bionic

DVC was installed using sudo pip install dvc[all].

efiop commented 4 years ago

Discord context: https://discordapp.com/channels/485586884165107732/485596304961962003/659335075422535691

Need to take a look at gsutil, for some reason it doesn't require gcloud beta auth application-default login to work. Ideally we should behave the same.

vade commented 4 years ago

Is there an official way to have DVC use a service account set up for it? Apologies - im new to both DVC and gcloud nuances.

Ive set up a service account, have the local (private) key file on disk, and can run gcloud auth activate-service-account with my account name and key file, and verify my service account is listed in gcloud auth list

How to get DVC to respect / use that?

vade commented 4 years ago

gcloud config set account <service account here> may be the key?

vade commented 4 years ago

Sorry to spam this thread

Ive verified that my service account is active via gcloud auth list and I see an * next to my service account name.

                       Credentialed Accounts
ACTIVE  ACCOUNT
*       dvc-service-account@xxx.iam.gserviceaccount.com
        vade@xxx

Running dvc push gets me:

 dvc push
  0% Querying cache in gs://cinemanet-dataset/DVC|                                                                                                                                                                                                                                          |0.00/70.6k [00:00<?,     ?file/s]/usr/local/lib/dvc/google/auth/_default.py:69: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/
/usr/local/lib/dvc/google/auth/_default.py:69: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/
/usr/local/lib/dvc/google/auth/_default.py:69: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/
vade commented 4 years ago

I think this resolved it for me

export GOOGLE_APPLICATION_CREDENTIALS="/Path/to/my/keyfile.json" where this is the JSON key file generated for the google service account.

Hopefully this monologue is helpful to someone!

shcheklein commented 4 years ago

@vade thanks! hat would be great to update docs. Let me know if you'd like to make a PR for that - I can help with that.

vade commented 4 years ago

I'm down to help with that. Docs are so key for a projects success and making users lives easier. Let me confirm with a colleague this solution is working - (they have a touch more experience than I do with DVC) - if you don't hear from me in a day or two please reply - it's not you its me! 😂🤣

rsomani95 commented 4 years ago

export GOOGLE_APPLICATION_CREDENTIALS="/Path/to/my/keyfile.json" where this is the JSON key file generated for the google service account.

(@vade's colleague here) I can confirm this solution works.

shcheklein commented 4 years ago

Thanks, guys! It would be great to edit this file https://github.com/iterative/dvc.org/blob/master/public/static/docs/command-reference/remote/modify.md, the Click for Google Cloud Storage section.

vade commented 4 years ago

https://github.com/iterative/dvc.org/pull/1030

shcheklein commented 4 years ago

@vade thanks! Just a minor question in the PR for us to better understand the change. If you can share more info that would help. And a minor typo.

vade commented 4 years ago

Totally, I responded with more info and a question of my own in the PR. LMK!

shcheklein commented 4 years ago

@vade I tried to "play" with this a little bit more ... could you please, try to use

dvc remote modify storage credentialpath /Path/to/my/keyfile.json for the service account

where keyfile.json is credentials file for the service account that has proper access.

that alone worked for me.

Though env variable should be fine also.

I'll review the docs PR and probably simplify a bit (to put links to the Google official auth docs mostly instead of replicating auth flow on our end).

shcheklein commented 4 years ago

@michaelitvin btw, do you remember how the gsutil was installed? part of the SDK or with pip? Was you running it on your local machine or in the cloud?

vade commented 4 years ago

Hey - gsutil is, weirdly, installed with the SDK. It typically runs local on the machine that is a DVC client (ie, pushing or pulling to the GCloud remote)

shcheklein commented 4 years ago

@vade did dvc remote modify storage credentialpath /Path/to/my/keyfile.json work for you, btw? have you had time to try?

jorgeorpinel commented 4 years ago

Hi! Another user reported a similar issue recently (DVC 1.6ish), with some additional problems and suggestions to ipmrove the UI and/or docs. See https://discord.com/channels/485586884165107732/485596304961962003/752939146430906419

Quick summary:

if I remove credentialpath from the config, successfully run gcloud init, and run dvc push, I'm back at Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS (but that requires a service account setup). My intent when I started was to use GS + DVC with my user account credentials, not a service account.

BTW, existing documentation about this is spread in https://dvc.org/doc/user-guide/setup-google-drive-remote#using-service-accounts and https://dvc.org/doc/command-reference/remote/add#supported-storage-types and https://dvc.org/doc/command-reference/remote/modify#available-parameters-per-storage-type mainly.

jorgeorpinel commented 4 years ago

And an additional question from the same user:

succeeded with a service account approach, but only after I set the role for it to "Owner". Feels like a sledgehammer approach. What are the minimum required role(s) for a GS bucket service account?

(Should this be a separate question issue?) Cc @mvshmakov maybe remembers 🙂

drorata commented 4 years ago

My 2cts: For someone used to S3 remotes, the GCP track is much more painful. It seems like using a GCP bucket is very different than one on S3.

fedorov commented 4 years ago

I had the error below when I tried to push to a newly created GS remote, while regular gsutil commands worked fine. gcloud auth login didn't fix the problem, but gcloud beta auth application-default login did.

image

Huge thanks to @pmrowla for helping me figure this out!

mvshmakov commented 4 years ago

And an additional question from the same user:

succeeded with a service account approach, but only after I set the role for it to "Owner". Feels like a sledgehammer approach. What are the minimum required role(s) for a GS bucket service account?

(Should this be a separate question issue?) Cc @mvshmakov maybe remembers 🙂

Feels like @Suor can help with that. Sorry for such a late response, I've missed the notification.

isidentical commented 3 years ago

I think this might be the normal behavior. If you'd like the auth you have done to be the default, you could use gcloud auth login --update-adc.

isidentical commented 3 years ago

With #5500, you are also able to supply your personal login info via credentialspath. Like

dvc remote modify origin credentialpath ~/.config/gcloud/legacy_credentials/{your google account}/adc.json
isidentical commented 3 years ago

Considering #5500 is merged now, the default behavior is now choosing the default credentials. Which can be set using the gcloud auth login --update-adc. Or via specifying the credentialpath.