googleapis / google-cloud-go

Google Cloud Client Libraries for Go.
https://cloud.google.com/go/docs/reference
Apache License 2.0
3.72k stars 1.27k forks source link

all: provide project ID detection #1294

Closed jeanbza closed 3 years ago

jeanbza commented 5 years ago

We should provide something like a google.DefaultProjectID method that attempts to discover project ID from ADC. This should probably live in x/oauth2/google (much of the logic for ADC lives there). This should be accompanied by documentation in clients that require project ID showing them how to use this.

Reference internal bug: 123588169

jba commented 5 years ago

Not sure that's the right place. Is it possible to get the project ID from compute metadata even if the creds are explicitly supplied (e.g. a JSON file)? If so, and if that's something you want to support, then the function isn't really about creds. Maybe somewhere under google.golang.org/api?

jeanbza commented 5 years ago

That's a good point. google.golang.org/api makes sense. :+1:

zaddok commented 5 years ago

How about a DefaultLocationId as well?

(The tasks api needs it as well as the project id.)

broady commented 5 years ago

I don't think this can/should be implemented as a standalone function.

For example, if the user has provided a path to a service account file (using WithCredentialsFile) and that credentials file includes the project ID, that project ID should be used.

If this is implemented as a standalone function, it should also include the same API options as client creation.

I would lean towards being a sentinel value and the actual detection happening after API options have been processed.

While chatting with @jadekler today, I also came up with an idea to mitigate the issue of ambiguous project IDs (e.g., metadata server, environment variable, and credentials file with different IDs). The obvious solution is to introduce an ordering, similar to default credentials. However, connecting project IDs to default credentials is still dangerous and can still lead to unexpected data being mistakenly sent to the wrong project.

Since this is a convenience function, and there are many edge cases, I think it makes most sense to only allow detection when there is no ambiguity on detected project ID. That is, all mechanisms for detecting project ID should match. If they don't match (for example, using a service account credential from a different project within a Cloud Function), then an error is returned - the user must explicitly set their project ID.

A list of places to look for project IDs (non-exhaustive, please comment if there are more):

broady commented 5 years ago

Oh, to clear up any confusion, detection via metadata already exists:

https://godoc.org/cloud.google.com/go/compute/metadata#ProjectID

If you're looking for simple detection and you know you want the project ID from the metadata server (e.g., you're running on GAE, GCF, GKE, GCE), use this.

broady commented 5 years ago

Spoke more with @theacodes yesterday. We don't need to detect via environment variables. That simplifies things to "always use the project ID that's adjacent to the credentials being used". All commonly used credentials include a project ID (service account key file, metadata server).

The low hanging fruit is adding the ProjectID to the Credentials struct when we use credentials from metadata server. oops, this is already done.

Then we need to figure out the best way to make it easy to use when constructing a client.

Some ideas, using a credentials file option:

datastore.NewClient(ctx, "", option.DefaultProjectID(), option.WithCredentialsFile("/path/to/creds.json"))
import "google.golang.org/api/transport"

credOpt := option.WithCredentialsFile("/path/to/creds.json")
cred, _ := transport.Creds(ctx, credOpt)
datastore.NewClient(ctx, cred.ProjectID, credOpt)
datastore.NewClient(ctx, datastore.DetectProjectID, option.WithCredentialsFile("/path/to/creds.json"))

log.Print(datastore.DetectProjectID == "<sentinel-value-detect-project-id-not-a-real-project-id>") // true
poy commented 5 years ago

So while constructing a client currently, the docs seem to suggest that leaving the project ID empty does imply that the library will try to infer (sorta) the project ID:

If the project ID is empty, it is derived from the DATASTORE_PROJECT_ID environment variable (https://godoc.org/cloud.google.com/go/datastore#NewClient)

So could this be expanded to instead of just looking at DATASTORE_PROJECT_ID, it inspects the credentials?

I don't know that the option option.DefaultProjectID() is really necessary as thats what the empty project ID implied. We could add a constant datastore.DetectProjectID, but I believe that constant should just be an empty string to match current behavior.

broady commented 5 years ago

@poy ya, but only for datastore, to support the emulator.

I have reservations about using empty string as "please detect", since empty environment variables evaluate to empty string, so you could image someone setting the wrong env var, and then ending up with detection.

Seems like a bit of an edge case, though. I don't agree it's explicit, but maybe there's nothing very wrong with using it to signify detection.

poy commented 5 years ago

@broady The docs seem to indicate that DATASTORE_EMULATOR_HOST is used for the emulator.

I agree though that an empty environment variable might lead to unexpected behavior.

That being said, if we can read the project from the creds, does it make sense to ever have the developer provide the project ID. Put differently, can you get creds from project A to work with project B?

broady commented 5 years ago

The docs seem to indicate that DATASTORE_EMULATOR_HOST is used for the emulator.

DATASTORE_PROJECT_ID, too. Try running gcloud beta emulators datastore start and gcloud beta emulators datastore get-env

Put differently, can you get creds from project A to work with project B?

Yes.

poy commented 5 years ago

So it looks like the command is gcloud beta emulators datastore env-init (for anyone who stumbles into this later) You're right though, you have to have both DATASTORE_PROJECT_ID and DATASTORE_EMULATOR_HOST for the emulator. The docs didn't lead me down that path...

Haha oof. I guess I shouldn't be surprised that you can do that with creds and projects.

evanj commented 5 years ago

FWIW, Java provides ServiceOption.getDefaultProjectId and Python auto-detects the project ID when creating service clients. The order of these seems to be slightly different, but generally is:

broady commented 5 years ago

@evanj we're not going to do env vars like GOOGLE_CLOUD_PROJECT (complicates things too much, in terms of intent, as you see above) or gcloud SDK (requires shelling out)

poy commented 5 years ago

We have added a sentinel value for the Datastore client: 25417a16406ad4ee18579ddfb574f2774336ea1c

codyoss commented 4 years ago

List of Clients that should be updated:

jeanbza commented 4 years ago

FYI we will only provide this to the non-generated clients: bigquery, bigtable, datsatore, firestore, logging, pubsub, spanner, storage.

Also FYI these outstanding CLs:

The pubsub and bigquery are ones we could probably pick up and take over the finish line - the others I'm not sure are clients we should add features to.

jeanbza commented 4 years ago

This appears to be done. Closing.

leklund commented 4 years ago

It appears that this was never done for bigquery (or bigtable). Happy to open a new issues and/or pull the code over from Gerrit into a new GH PR.

codyoss commented 4 years ago

@leklund Thanks for the reminder. This was noticed a little while back but it appears I forgot to open the issue back up. We still plan to get this supported.

arikkfir commented 1 year ago

It appears that this is still not done for Pub/Sub (unless I'm missing something)

minherz commented 1 year ago

It would be nice if the behavior is aligned with the implementations in other languages (e.g. Java or Python) where it can be inferred from client settings.