opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
68 stars 16 forks source link

Bug: dagster dbt run is failing due to permissions for githubarchive data #1677

Closed ravenac95 closed 2 months ago

ravenac95 commented 2 months ago

What is it?

See: https://dagster.opensource.observer/runs/83191681-1aeb-422a-b708-f70037a325de

ravenac95 commented 2 months ago

This is a fascinating bug but I think it's related to the fact that githubarchive like our pull request datasets is made public by enabling the special user allAuthenticatedUsers. However, when the dagster's kubernetes service account makes requests through the GCP workload identity service account there is no actual "authenticated user" in the logs. I tried this on the oso-pull-requests datasets and got a similar error:

BigQuery error in query operation: Error processing job 'opensource-observer:bqjob_r585db9764766f3ad_00000190330c51ef_1': Access Denied: Table
oso-pull-requests:pr_1675.artifacts_v1: User does not have permission to query table oso-pull-requests:pr_1675.artifacts_v1, or perhaps it does
not exist.

However, when using a service account that has an "email" things seem to work. I think the solution is to impersonate a service account.

ravenac95 commented 2 months ago

Closed with #1685