googleapis / python-bigquery-sqlalchemy

SQLAlchemy dialect for BigQuery
MIT License
435 stars 129 forks source link

Support for multi statement select #1047

Open aczire opened 7 months ago

aczire commented 7 months ago

Please find below minimum reproducible code to run a muti statement select. The problem here is, Big Query runs the select statement as multiple statements, and the result contains dataset from the the first dataset only, everything else is discarded.

You can run this in a Big Query console and see the difference.

How can I get the result for all the statements executed.

from google.cloud import bigquery

def run_query(sql_query):
    client = bigquery.Client()
    query_job = client.query(sql_query)

    # Wait for the job to complete
    results = query_job.result()
    print("Query complete!")
    print(f"Results: {results.total_rows}")

    for row in results:
        print(row)

if __name__ == "__main__":

    sql_query = """
        BEGIN
            FOR record IN 
            (
            SELECT num FROM UNNEST(GENERATE_ARRAY(1, 5)) AS num
            )
            DO
            WITH numbers AS (
                SELECT num
                FROM UNNEST(GENERATE_ARRAY(1, 100)) AS num  -- Adjust the range as needed
            )
            SELECT num
            FROM numbers
            LIMIT 10;
            END FOR;
        END
    """

    # Execute the demo query
    run_query(sql_query)

Note: Th equery is just for demonstrative purpose of multi statement execution only and the correctness of its functionality is never intended.

aczire commented 7 months ago

It seems the child statements are child jobs and the result for them are available from the child job result object.

Now the question is how to make it work with python-bigquery-sqlalchemy?

    client = bigquery.Client()
    query_job = client.query(sql_query)

    # Wait for the job to complete
    results = query_job.result()
    print("Query complete!")
    print(f"Results: {results.total_rows}")

    for row in results:
        print(row)

    print("Parent Job ID: ", query_job.job_id)

    all_statements = client.list_jobs(parent_job=query_job.job_id)

    for child_job in all_statements:
        job: QueryJob = child_job
        print(f"{job.job_id}, {job.state}")
        results = query_job.result()
        print(f"Child job results: {results.total_rows}")

        for row in results:
            print(row)