hafenkran / duckdb-bigquery

Integrates DuckDB with Google BigQuery, allowing direct querying and management of BigQuery datasets
MIT License
58 stars 3 forks source link

Add support for DuckDB secret manager as a way to provide BigQuery authentication #8

Open Kayrnt opened 3 months ago

Kayrnt commented 3 months ago

DuckDB Secret manager allows to declare and store secrets to be used in DuckDB or the extension.

In my version of the DuckDB bigquery extension, I've proposed following approach:

CREATE SECRET duckdb_bigquery_secret (
    TYPE bigquery,
    service_account_json '{ "type": "service_account", "project_id": "my-gcp-project", "private_key_id": "xxxx", "private_key": "-----BEGIN PRIVATE KEY-----\nxxx\n-----END PRIVATE KEY-----\n", "client_email": "xxx@some-gcp-project.iam.gserviceaccount.com", "client_id": "xxx", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/xxx" }'

ATTACH 'my_gcp_bq_storage_project' AS bq (TYPE bigquery, SECRET duckdb_bigquery_secret);
);

Then when the secret is provided to the ATTACH, it would override the lookup to GOOGLE_APPLICATION_CREDENTIALS.

hafenkran commented 3 months ago

Hey, thanks for the suggestion! I think using the DuckDB secret manager to declare and store secrets for the extension is a great idea and should be considered.

However, I'm not entirely sure if we really need to include the service account JSON in the first place. The default credentials mechanism works pretty well in my experience. Instead, I could see this being useful for access tokens (see gcloud auth print-access-token). At least, that's something I sometimes use. But I'm currently not quite sure what the google-cloud-cpp lib supports here.

Kayrnt commented 3 months ago

There are quite a few ways to authenticate and I guess some of them could benefit being declared that way. Here's one use case: you want to mount 2 BigQuery projects that have different GCP projects with each their own service account, you want to uses 2 keys then, how do you do it with the current system?

hafenkran commented 3 months ago

Yep, thats an use-case. I've not questioned that. I always only used one SA per application having permission for multiple projects. Just wanted to point out that it's probably even more valuable for all other types of auth and that those should be considered as well.

Kayrnt commented 3 months ago

Sure, it doesn't have to replace existing system but provide an alternative approach to bring credentials. Does the proposed form looks good to you to move forward?

hafenkran commented 2 months ago

Hey @Kayrnt, I'm not sure right now but do you wanted to work on this or started already investigating on this? During our discussion you said something that you might wanna issue some PRs.

Kayrnt commented 2 months ago

Hey @hafenkran, right! I didn't get the notification on that thumbs up 😅 I didn't start yet but I'll try to look into it as I've been busy with my other projects and tech blog. Do you want to take it or can it wait maybe like 2 weeks that I can get something done?

hafenkran commented 2 months ago

If I had still been reading this, I would have gladly given you the time. That wasn't really time critical. I also started to prepare a blog post about this extension right now ;)