lquerel / gcp-bigquery-client

GCP BigQuery Client (Rust)
Apache License 2.0
97 stars 67 forks source link

make ResultSet more general #97

Closed imor closed 3 weeks ago

imor commented 4 months ago

NOTE: review and merge the following PR first: https://github.com/lquerel/gcp-bigquery-client/pull/96. This is because this PR's branch is cut from the previous PR's branch.

This PR makes ResultSet more general by only holding onto a Vec<TableRow> instead of a QueryResponse. This change makes it possible to create a ResultSet not only from a QueryResponse but also from a GetQueryResultsResponse. Need for this arose when I wanted to create a single function which will first call JobApi::query and if it returns job_complete=false then fall back to polling JobApi::get_query_results to get the results when the query job finishes. Such a function needed the ability to create a ResultSet from the results of either of these APIs. This is the function I wrote that uses the refactored code:

async fn query(&self, query: String) -> Result<ResultSet, BQError> {
        let query_response = self
            .client
            .job()
            .query(&self.project_id, QueryRequest::new(query))
            .await?;

        if query_response.job_complete.unwrap_or(false) {
            info!("job complete. returning result set");
            Ok(ResultSet::new_from_query_response(query_response))
        } else {
            let job_id = query_response
                .job_reference
                .as_ref()
                .expect("missing job reference")
                .job_id
                .as_ref()
                .expect("missing job id");
            loop {
                info!("job incomplete. waiting for 5 seconds");
                sleep(Duration::from_secs(5)).await;
                let result = self
                    .client
                    .job()
                    .get_query_results(&self.project_id, job_id, Default::default())
                    .await?;
                if result.job_complete.unwrap_or(false) {
                    info!("job complete. returning result set");
                    let result_set = ResultSet::new_from_get_query_results_response(result);
                    break Ok(result_set);
                }
            }
        }
    }

Note: This is a breaking API change, so should be included only in a major version bumped release.

lquerel commented 4 months ago

I have integrated your other PRs and published a new version of the crate. However, there are numerous conflicts with this latest PR. Could you adapt this PR to the changes made? Thank you in advance.

Version v0.22.0 available on crates.io.

imor commented 4 months ago

@lquerel Thanks for publishing the new version. I have cleaned up this PR's code and it compiles. Looking at why tests and clippy are failing.

imor commented 4 months ago

Looks like clippy and tests failed because protoc is missing: https://github.com/lquerel/gcp-bigquery-client/actions/runs/9835163714/job/27148284036?pr=97#step:5:406. I'm not sure why it wasn't a problem before.

imor commented 4 months ago

Tests and clippy are fixed in https://github.com/lquerel/gcp-bigquery-client/pull/98. Once that PR is merged, this one should be green.

imor commented 4 months ago

@lquerel since now the protoc deps have been vendored, would you mind taking a look at this PR. The clippy failure has been fixed in PR #105, but the tests are failing due to a bad service account key.

lquerel commented 4 months ago

I will work on this PR this weekend.

lquerel commented 4 months ago

I left a question/request for you (on a previous PR) that I need to address before I can devote time to this specific PR. It’s probably a gross error on my part, but I don’t understand what’s happening.

https://github.com/lquerel/gcp-bigquery-client/pull/95

lquerel commented 3 months ago

I’d like to avoid making this type of breaking change on such common methods if possible. Why not convert the ResultSet structure into a templated enum? This would allow us to have two variants: one for QueryResponse and another for GetQueryResultsResponse, all without causing any breaking changes. It’s possible I’m missing something, but it’s worth discussing before making this kind of change. In between, I published a new version of this crate with the other updates.

imor commented 3 months ago

I published a new version of this crate with the other updates.

Thank for this.

Why not convert the ResultSet structure into a templated enum?

One reason is that for my code example (in the description of the PR) to work it needs access to the QueryResponse object's job_complete field. If I get back a ResultSet I can no longer access job_complete. Another is that the API difference between Job::query and Job::get_query_results: one returns a ResultSet and the other GetQueryResultsResponse, so I felt like making them both return a raw response rather than a ResultSet would make them more consistent. The options I see are the following:

  1. Do not make this change at all, but this means I'd need to duplicate the code in Job::query to access the QueryResponse object.
  2. Keep Job::query (maybe deprecate it) but add another method Job::query_v2 or something which has the new signature.
  3. Make a breaking change but publish it with a semantic version bump which indicates that there are breaking changes.

Which one do you think is the best?

lquerel commented 3 months ago

@imor I’ve taken the time to reflect on this change, and I believe that, in the long term, you are right. It is preferable to make the API more consistent, even if it means introducing a breaking change. Could you please add an update to the CHANGELOG.md in your pull request, including a notification about the breaking change, the rationale behind it, and instructions on how to migrate? Additionally, the main README.md example should be updated, with a notification at the beginning indicating the breaking change and linking to the changelog for details.

Moreover, with my current activities, I am struggling to dedicate enough time to this library and am looking for two co-maintainers to help evolve and maintain it. Would you be interested?

imor commented 3 months ago

@lquerel Apologies for replying late, was quite busy this week. I've updated CHANGELOG.md and README.md files.

Moreover, with my current activities, I am struggling to dedicate enough time to this library and am looking for two co-maintainers to help evolve and maintain it. Would you be interested?

I probably won't be able to spend time developing major new features or refactorings, but I'm happy to help review PRs, or commit other minor fixes etc.

lquerel commented 3 weeks ago

New version 0.24.0 available on crates.io. Thank you for your contribution and sorry for the delay.