googleapis / google-cloud-go

Google Cloud Client Libraries for Go.
https://cloud.google.com/go/docs/reference
Apache License 2.0
3.72k stars 1.28k forks source link

feat(bigquery): Configurable table read session project #10924

Open juanli16 opened 3 days ago

juanli16 commented 3 days ago

When creating a BigQuery table RowIterator with StorageReadAPI enabled, the read session is created by default in the table's project. We should be able to overwrite this, so that we can keep the storage of the table data and the permission/cost management of the read session separate.

To accomplish this, TableReadOption with WithClientProject option is added, and if set, it will create the session using the client's project, otherwise, it keeps the default behaviour.

google-cla[bot] commented 3 days ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

alvarowolfx commented 2 days ago

hey @juanli16 thanks for the contribution. Some question on this improvement and your use case ?

Do you have multiple billing project that can be target for different table reads or it's the same project ID that you want to configure the same as the project ID set up on on the BigQuery client ? We make this configuration more global to the ReadClient and use the projectID already on the bigquery.Client.

One concern that I'm having here is changing the signature of the Table.Read method to add client options, we thought on doing that in the past, but we always go back and forth on that idea. I'd rather prefer this for now to be an option on the ReadCient, like I started on PR https://github.com/googleapis/google-cloud-go/pull/10580. I can check to merge PR https://github.com/googleapis/google-cloud-go/pull/10580 and then you turn this improvement into a ReadClient level option, what do you think ?

And can we add integration tests ? Probably exercise reading a public dataset table and creating a session from the main project ID that the client is set up.

juanli16 commented 2 days ago

Do you have multiple billing project that can be target for different table reads or it's the same project ID that you want to configure the same as the project ID set up on on the BigQuery client ?

Yes, in our use case, we have the same billing project that we want to use for reading tables stored in different projects.

I'd rather prefer this for now to be an option on the ReadCient, like I started on PR https://github.com/googleapis/google-cloud-go/pull/10580. I can check to merge PR https://github.com/googleapis/google-cloud-go/pull/10580 and then you turn this improvement into a ReadClient level option, what do you think ?

This makes sense to me, it will make the setting more global, and we won't have to set it per table read.

And certainly, I can try to add integration test for it.

fsaintjacques commented 2 days ago

It's more about having right to read from a table but without billing access to where it lives. In fact, in our use case, the service account does have the permissions to read on the table, but no other permissions including on the project where said table lives.

It's akin to controlling in which project job.insert is submitted, sometimes you simply don't have the choice (think public datasets).