joshtemple / harlequin-bigquery

A BigQuery adapter for Harlequin, a SQL IDE for the terminal.
MIT License
8 stars 2 forks source link

Catalog listing error - potential fallback approach #12

Open quincysmiith opened 8 months ago

quincysmiith commented 8 months ago

When opening harlequin with BigQuery I am greeted with the following error

403 POST https://bigquery.googleapis.com/bigquery/v2/projects/gcp-wow-food-wlx-digaspt-dev/queries?prettyPrint=false: Access Denied: Table gcp-wow-food-wlx-digaspt-dev:region-US.INFORMATION_SCHEMA.COLUMNS: User does not have permission to query table gcp-wow-food-wlx-digaspt-dev:region-US.INFORMATION_SCHEMA.COLUMNS, or perhaps it does not exist in location US.

If I am reading this error correctly it is saying that I do not have enough (or high enough permissions) to query this table.

Looking through the source code I can see that the query used to get the catalog information is as follows


select
      datasets.schema_name as dataset_id,
      tables.table_name as table_id,
      tables.table_type as table_type,
      columns,column_name,
      columns.data_type as column_type
from `{self.project}.region-{self.location}.INFORMATION_SCHEMA.SCHEMATA` datasets
left join `{self.project}.region-{self.location}.INFORMATION_SCHEMA.TABLES` tables
on datasets.catalog_name = tables.table_catalog
and datasets.schema_name = tables.table_schema
left join `{self.project}.region-{self.location}.INFORMATION_SCHEMA.COLUMNS` columns
using (table_catalog, table_schema, table_name)
order by dataset_id, table_id, column_name

So it is the gcp-wow-food-wlx-digaspt-dev:region-US.INFORMATION_SCHEMA.COLUMNS table that my permissions is not sufficient to access. (confirmed also in BigQuery console with equivalent account)

Background:

The table gcp-wow-food-wlx-digaspt-dev:region-US.INFORMATION_SCHEMA.COLUMNS requires a regional scope in access which I think I do not have. I imagine there are numerous other projects with datasets in the same region that I rightly should not have access to (or be able to see their meta data). Additionally I think this is not an uncommon permission set up for a large corporate in GCP and BigQuery.

Having looked through this page https://cloud.google.com/bigquery/docs/information-schema-tables I believe the same data can be obtained by (iteratively) querying the following table gcp-wow-food-wlx-digaspt-dev.dataset_name.INFORMATION_SCHEMA.COLUMNS which has a "dataset location" scope. (The same would need to be applied to obtaining the data for tables). This is a table that I do have access to and am able to query without issue.

POTENTIAL SOLUTION:

Query the {self.project}.{self.dataset}.INFORMATION_SCHEMA.COLUMNS and {self.project}.{self.dataset}.INFORMATION_SCHEMA.TABLES tables instead of the {self.project}.region-{self.location}.INFORMATION_SCHEMA.COLUMNS and {self.project}.region-{self.location}.INFORMATION_SCHEMA.TABLES tables to generate the data that is used to populate the catalog

Speculation: This would also have the benefit of populating the catalog with all BigQuery tables in a project even if they are in different regions (eg one dataset in US and one dataset in Australia)

tconbeer commented 2 months ago

I ran into this same issue, but I was able to resolve it with IAM. It appears that being a project-level Owner does not also grant you the BigQuery Admin role, which includes the required scopes to query the information schema. Any of these roles (at the project level) will also work: