r-lib / gargle

Infrastructure for calling Google APIs from R, including auth
https://gargle.r-lib.org/
Other
113 stars 33 forks source link

Using non-default service account on GCP #281

Open jennybc opened 10 months ago

jennybc commented 10 months ago

I'm opening this issue to record an internal discussion motivated by a query from a Posit customer. But I think the answer might benefit others or our future selves, so I'm putting this here. It's framed in terms of BigQuery, so I'm answering that way. But it's really a gargle question, so I'm filing this issue here instead of in bigrquery.

Customer has Posit Workbench and Connect running on a GCP cluster.

bigrquery::bq_auth() automatically picks up the default service account. As a result, BigQuery tables associated with the default service account are automatically accessible.

However, customer wants users or deployed products to be able to access tables associated with other service accounts.

For security reasons, customer does not allow the generation of service account credentials as a .json file. So that's not a route for introducing these non-default service accounts.

Is it possible to start Workbench sessions or associate Connect content with different GCP service accounts?


I don't have a complete answer to this, but I have some specific observations that can form the basis for a solution. So I'll take this in a few pieces.

The default service account. What they are currently experiencing is coming from a successful auth via gargle::credentials_gce(), which is one of the credential fetchers tried by gargle::token_fetch(). This is all going on behind the scenes inside bigrquery via bq_auth().

https://gargle.r-lib.org/reference/credentials_gce.html

Specifically, pay attention to the service_account argument:

credentials_gce(
  scopes = "https://www.googleapis.com/auth/cloud-platform",
  service_account = "default",
  ...
)

But you can specify another value for service_account (IF you've done some prep work elsewhere/before).

You can see the available service accounts with gargle::gce_instance_service_accounts():

gargle::gce_instance_service_accounts()

I have only done this personally when doing gargle dev work and using the googleComputeEngineR R package to get some GCE experience. In that setting, the way to make other service accounts available to workloads is to set that up when you create (or is it launch? 🤔 ) the VM. The R code for that looks something like this:

vm <- gce_vm(
  template = "rstudio",
  name = "trustful-bull",
  username = "USERNAME",
  password = "PASSWORD",
  predefined_type = "e2-standard-4",
  serviceAccounts = list(
    list(
      email = "1234-compute@developer.gserviceaccount.com",
      scopes = c(
        "https://www.googleapis.com/auth/cloud-platform",
        "https://www.googleapis.com/auth/drive"
      )
    )
  )
)

So whoever is spinning up the relevant GCP cluster will have to do whatever is necessary to attach the relevant service accounts. An individual R user or deployed product can't influence or "fix" this. It has to be done by an admin.

Back to the R users on Workbench or Connect ...

bigrquery::bq_auth() doesn't (yet) have a way of passing the service_account argument to gargle::token_fetch() and, eventually, gargle::credentials_gce() (https://github.com/r-lib/gargle/issues/249). So the user's code must call credentials_gce() directly and pass the resulting token to bq_auth(). I don't think I have done this personally and I'm not in a position to test it right now, but here is some untested code for that.

token <- gargle::credentials_gce(
  scopes = c("https://www.googleapis.com/auth/bigquery",
    "https://www.googleapis.com/auth/cloud-platform"),
  service_account = "some_non_default_service_account"
)
bq_auth(token = token)

Here's some code for checking "who am I?" to see if you're auth'ed as the account you are hoping for:

bq_user()
gargle::token_userinfo(token)
gargle::token_email(token)
gargle::token_tokeninfo(token)