lquerel / gcp-bigquery-client

GCP BigQuery Client (Rust)
Apache License 2.0
92 stars 60 forks source link

Tips for reading TableRows #59

Open natthew opened 1 year ago

natthew commented 1 year ago

Hi, thanks for writing and releasing this repository!

I'm using the query_all API to get a bunch of rows, and I'm trying to transform the results from TableRow to some struct.

As #31 mentioned, it seems like all the results are all returned as strings. It's also pretty cumbersome to do this transformation. Consider the parse function in the example below:

struct Example {
    letter: String,
    number: i64,
}

fn parse(row: &TableRow) -> Example {
    Example {
        letter: row
            .columns
            .as_ref()
            .unwrap()
            .get(0)
            .unwrap()
            .value
            .as_ref()
            .unwrap()
            .as_str()
            .unwrap()
            .to_string(),
        number: row
            .columns
            .as_ref()
            .unwrap()
            .get(1)
            .unwrap()
            .value
            .as_ref()
            .unwrap()
            .as_str()
            .unwrap()
            .parse::<i64>()
            .unwrap(),
    }
}

async fn load_examples() -> Result<Vec<Example>, BQError> {
    let client = Client::from_service_account_key_file(BQ_SA_KEY).await?;
    let response = client.job().query_all(
        GCP_PROJECT,
        JobConfigurationQuery {
            query: "SELECT x AS letter, 1 AS number FROM UNNEST(['a', 'b', 'c']) x".to_string(),
            use_legacy_sql: Some(false),
            ..Default::default()
        },
        Some(2),
    );

    tokio::pin!(response);

    let mut examples: Vec<Example> = vec![];
    while let Some(page) = response.next().await {
        match page {
            Ok(rows) => {
                examples.extend(rows.iter().map(parse));
            }
            Err(e) => {
                return Err(e);
            }
        }
    }
    Ok(examples)
}

fn main() {
    let rt = Runtime::new().unwrap();
    let examples = rt.block_on(load_examples()).expect("bigquery error");
    for e in examples {
        println!("letter: {}\tnumber: {}", e.letter, e.number);
    }
}

It's pretty awkward! There are two issues at play:

1) I need to take the number result as a string and then parse an i64 out of it. 2) It's pretty cumbersome to get the actual result values from a TableRow.

I'm sure there exists a good way to, given a TableRow and a TableSchema, construct a struct, but I'm not sure how to do it. And maybe the first issue is not a bug, and just an issue with how I'm reading the data, but I can't find a way to get it to work.

Would it be possible to create an example of how to use this API to generate a clean data structure out of a TableRow?

krystianity commented 1 month ago

Hi @natthew this post is a bit older, but in case anyone else stumbles upon it. Since the TableCells are serde::Value you can merge all columns of a row in a parent Value and and use that to deserlize into a specific type.

Here is the sample code:

pub async fn query<T: serde::de::DeserializeOwned>(&self, sql: &str,
                                                       query_parameters: Vec<QueryParameter>) -> anyhow::Result<Vec<T>> {

        let mut query_request = QueryRequest::new(sql);
        query_request.parameter_mode = Some("NAMED".to_string());
        query_request.use_legacy_sql = false;
        query_request.query_parameters = Some(query_parameters);

        let mut result_set = self.client
            .job()
            .query(&self.project_id, query_request)
            .await?;

        let columns = result_set.column_names();
        let mut rows: Vec<T> = vec!();
        while result_set.next_row() {

            let mut row_values: serde_json::Map<String, Value> = serde_json::Map::new();

            for column in columns.iter() {
                let column_value = result_set.get_json_value_by_name(column)?;
                if let Some(column_value) = column_value {
                    row_values.insert(column.clone(), column_value);
                } else {
                    row_values.insert(column.clone(), Value::Null);
                }
            }

            let row_as_value = Value::Object(row_values);
            let row_as_type: T = serde_json::from_value(row_as_value)?;
            rows.push(row_as_type);
        }

        Ok(rows)
    }

Then use it like so

use serde::{Deserialize};
use serde_aux::prelude::*;

#[derive(Deserialize)]
pub struct SomeBQRow {
    pub a_key: String,
    #[serde(deserialize_with = "deserialize_number_from_string")]
    pub an_int: i64
}

let sql = format!(
           "SELECT * FROM `{}.{}.{}` WHERE ONE_COLUMN = @my_filter LIMIT 10",
            &bigquery_client.project_id, &bigquery_client.dataset_id,
            &bigquery_client.table_id
       );

let mut query_params = vec!();
query_params.push(BigqueryClient::create_string_query_param("my_filter", "some value"));

let results: Vec<SomeBQRow> = bigquery_client.query(sql.as_str(), query_params).await?;