googleapis / python-bigquery

Apache License 2.0
746 stars 306 forks source link

with BigQuery Client - MultiDataframe for Multistatement with python #1962

Closed JVT-42000 closed 4 months ago

JVT-42000 commented 5 months ago

In python script, with the multistatemnte query below how can I get separatly the results for each "SELECT ...".

one dataframe for the first "SELECT name AS sname .." one dataframe for the second "SELECT name AS wname .."

multistatement_queries = """ CREATE TEMP TABLE top_names(name STRING) AS SELECT name FROM bigquery-public-data.usa_names.usa_1910_current WHERE year = 2017 ORDER BY number DESC LIMIT 100;

-- Which names begin with S ? SELECT name AS sname FROM top_names WHERE name like 'S%';

-- Which names begin with W ? SELECT name AS wname FROM top_namesWHERE name like 'W%';

DROP TABLE top_names; """

Linchin commented 4 months ago

You should be able to access the result of each statement through the corresponding child job. Here's the doc with instructions. I will close the issue now, but feel free to leave a comment below if you have any further questions :)

JVT-42000 commented 4 months ago

Thank you for the reply. But I use the BQ client as below. I think it's not the right way to get a "jobs.list()"

Is it possible with the google.cloud.bigquery.client, or I need to use another librarie ?

  from google.cloud.bigquery.client import Client 
  bq_client = Client(PROJECT_ID)
  ...
  query_job = bq_client.query(multistatement_queries)

  l_jobs = bq_client.list_jobs(project=PROJECT_ID)
  for index, job in enumerate(l_jobs):  # The job contains all statement
      print(job.job_id)     
      print(job.result()) 
      job.getQueryResults()  # KO
JVT-42000 commented 4 months ago

I find what was wrong. I needed to call list_jobs with the parent_job parameter parent_job=query_job.job_id

for job in bq_client.list_jobs(parent_job=query_job.job_id):
    print("Job ID: {}, Statement Type: {} state :{}".format(job.job_id, job.statement_type, job.state))
    r_job = job.result()
    print(r_job.to_dataframe())