googleapis / python-bigquery

Apache License 2.0
746 stars 306 forks source link

load_table_from_dataframe in combination with .result() doesn't properly await table creation. #1969

Closed ckanaar closed 4 months ago

ckanaar commented 4 months ago

It seems that calling load_table_from_dataframe() and awaiting the result of the load job using the .result() method doesn't always guarantee an in time creation, or readiness of the table, when using it in subsequent queries. In rare occasions, I get a google.api_core.exceptions.NotFound error when trying to access the newly created table directly after awaiting the load job result.

Environment details

Steps to reproduce

  1. Create a pandas DataFrame.
  2. Call load_table_from_dataframe using a write_disposition = WRITE_TRUNCATE and create_disposition = CREATE_IF_NEEDED in the bigquery.LoadJobConfig(), and awaiting the result using the .result() method.
  3. Directly query the newly created table afterwards.

Code example

import pandas as pd
from google.cloud import bigquery

dataframe = pd.DataFrame(
    {
    'name': ['John', 'Jane', 'Joe'],
    'age': [20, 25, 30]
    }
)
client = bigquery.Client(project=project_id)
table_id = f"{project_id}.{dataset_id}.test_table"
job = client.load_table_from_dataframe(
    dataframe=dataframe,
    destination=table_id,
    job_config=bigquery.LoadJobConfig(
        write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,
        create_disposition=bigquery.CreateDisposition.CREATE_IF_NEEDED
    )
)
job.result()

query = f"SELECT * FROM {table_id}"
query_job = client.query(query)
query_job.result()

Stack trace

Location: EU
Job ID: <id>

google.api_core.exceptions.NotFound: 404 Not found: Table <table_id>; reason: notFound, message: Not found: <table_id>

Note that this code snippet does not necessarily result in the NotFound error, only in some rare cases do I experience this issue, but there seems to be some inconsistency nonetheless.

ckanaar commented 4 months ago

Closing this since I can't reproduce.