meztez / bigrquerystorage

R Client for BigQuery Storage API
Apache License 2.0
19 stars 3 forks source link

`bqs_auth()` failing when `bigrquery::bq_has_token()` is TRUE #73

Closed jake-coleman32 closed 1 week ago

jake-coleman32 commented 2 weeks ago

When I try to run bigrquerystorage::bqs_table_download() directly after running bigrquery::bq_project_query(), I get the following warning:

Could not copy JSON property: UNKNOWN:Property client_secret not found in JSON object. 
Invalid input for refresh token credentials creation

which is triggered by bigrquerystorage::bqs_auth(). I then immediately get the error

Error: gRPC method CreateReadSession error -> request failed: the user does not have 'bigquery.readsessions.create' permission for [project]

I believe what's happening is that when bigrquery::bq_has_token() is TRUE, then bqs_auth() creates a refresh token from the existing bigrquery token; however, it seems that the information is not being passed correctly for me. Specifically, .authcred[["client"]] appears to be NULL for me, so then of course client_secret and client_id passed into bqs_client() are also NULL. I believe this results in bqs_refresh_token_credentials() being called with a faulty refresh token, resulting in the warning. However, because the client pointer is still created, it doesn't error until it actually tries to use the token in bqs_ipc_stream().

If I call bigrquery::bq_deauth() before calling bigrquerystorage::bqs_table_download(), then during bqs_auth() the boolean bigquery::bq_has_token() is FALSE (duh), so both refresh token and access token are empty strings. I believe this results in bqs_client() calling bqs_google_credentials(), which then calls grpc::GoogleDefaultCredentials() and everything is right as rain. No warning in bqs_auth(), no error in bqs_ipc_stream().

Do you know why asNamespace("bigrquery")[[".auth"]][["cred"]][["client"]] might be NULL, or else check for that when setting the refresh token?

meztez commented 2 weeks ago

Which credentials retrieval method do you use and how long between the call to bq_project_query and bqs_table_download? Something I can reproduce would help greatly in locating where to apply a fix.

Otherwise, in the meantime, you can squeeze between the calls:

bigrquerystorage:::.global$client$ptr <- bigrquerystorage:::bqs_client(
    client_info = bigrquerystorage:::bqs_ua(),
    service_configuration = system.file(
      "bqs_config/bigquerystorage_grpc_service_config.json",
      package = "bigrquerystorage",
      mustWork = TRUE
    ),
    refresh_token = "",
    access_token = "",
    root_certificate = Sys.getenv("GRPC_DEFAULT_SSL_ROOTS_FILE_PATH")
  )
jake-coleman32 commented 2 weeks ago

Thanks for the quick reply! I believe locally I'm using user credentials stored in ~/.config/gcloud/credentials.db and ~/.config/gcloud/access_tokens.db. However, I do have the environmental variable GOOGLE_APPLICATION_CREDENTIALS pointing to a JSON file with my credentials as well (so ADC should work for me). Does that answer your question on credentials retrieval method?

No time between calls to bq_project_query and bqs_table_download.

And thank you for the suggestion! I've also found that bigrquery::bq_deauth() works if I squeeze between the calls as well, since it avoids the block of code in bqs_auth() that sets the refresh token (which I see mirrors your suggestion of a direct call to bqs_client that also passes an empty string to the refresh token argument).

meztez commented 2 weeks ago

I should be able to figure it out with this

meztez commented 1 week ago

@jake-coleman32 is your GOOGLE_APPLICATION_CREDENTIALS a service account?

What does it look like?

"type": "service_account",
  "project_id": "--omitted--",
  "private_key_id": "--omitted--",
  "private_key": "-----BEGIN PRIVATE KEY-------omitted--\n-----END PRIVATE KEY-----\n",
  "client_email": "bigrquerystorage-actions@--omitted--.iam.gserviceaccount.com",
  "client_id": "--omitted--",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/bigrquerystorage-actions%40--omitted--.iam.gserviceaccount.com",
  "universe_domain": "googleapis.com"
}

https://gargle.r-lib.org/articles/non-interactive-auth.html#provide-a-service-account-token-directly

jake-coleman32 commented 1 week ago

My type is "authorized_user" - however, it seems your fix results in the client_id and client_secret fields being populated in refresh_token. Thank you!