googleapis / nodejs-bigquery

Node.js client for Google Cloud BigQuery: A fast, economical and fully-managed enterprise data warehouse for large-scale data analytics.
https://cloud.google.com/bigquery/
Apache License 2.0
467 stars 211 forks source link

query result metadata problem #1362

Closed chartnado-dave-haas closed 5 months ago

chartnado-dave-haas commented 6 months ago

Hello.

We are developing an application the retrieves results from several SQL vendors, including BigQuery.

In the BigQuery case, for authentication we acquire an OAuth token from Google. There are 2 scopes we can use for the token:

  1. /auth/bigquery
  2. /auth/bigquery.readonly

We would prefer to query using the read only scope, since our software should be able to read result sets but not write.

However, we are experiencing an issue with this client library.

If our software acquires the read only scope, we are only able to run the .query(...) library method. The method executes successfully for a valid query but the column metadata of the result set is not provided.

For example (TypeScript)

let query: Query = {
  query: sql
};

// execute the query
let queryResult: SimpleQueryRowsResponse = await connection.query(query);

let rows: any[] = queryResult[0];
let job: bigquery.IJob = queryResult[1];     <- always undefined

console.log(rows);
console.log(job);

In this case we are expecting the IJob to return the column metadata of the result set, however it always returns undefined.

While it appears to make sense that the IJob would be undefined, since we did not actually create a job object, this results in no metadata. In order to determine the schema we have to resort to introspecting the returned rows. However, there appears to be no other way to obtain metadata from the simple .query(...) library method.

If our software acquires the full scope then we can run the .createQueryJob(...) library method, since we are now authorized to create jobs. In this case the result set metadata is provided, however this requires read/write access to the database and we would prefer read only.

We recommend that the .query(...) library method returns a result that includes both rows and column metadata.

OS: Windows 11 NodeJS Version: 20.12.12 Client Library Version: 7.6.1

alvarowolfx commented 6 months ago

hey @chartnado-dave-haas thanks for the report.

We have to separate here scopes vs permissions as for running queries on BigQuery, you have to have some minimum permission that are listed here: https://cloud.google.com/bigquery/docs/running-queries#required_permissions. It's not strictly related to the scopes that you mentioned here.

Another aspect about queries, is that all queries generate a BigQuery job (unless some recent work to have stateless jobs, that that's still on preview), so I need to investigate here what is going on that when using the SDK .query method, it is not returning metadata related to it.

In the meantime, the approach of creating a job using createQueryJob + getting the results would be the right approach if you need to access the job metadata. But I'll check here if there is a bug or maybe right now we don't support returning the job metadata with the .query method.

chartnado-dave-haas commented 6 months ago

Hello.

To be clear, all we are looking for is result set metadata (column names, data types, ...) returned from the .query() method.

Thanks,

alvarowolfx commented 6 months ago

this is a bug in our library, similar to #1210. This time I'm gonna work on fixing the backing issue on nodejs-paginator library and update accordingly here.