googleapis / python-bigquery-pandas

Google BigQuery connector for pandas
https://googleapis.dev/python/pandas-gbq/latest/index.html
BSD 3-Clause "New" or "Revised" License
448 stars 121 forks source link

append to table with DATETIME column with generated schema #450

Open tswast opened 2 years ago

tswast commented 2 years ago

Steps to reproduce

  1. Create a table with a DATETIME (not TIMESTAMP) column outside of pandas(-gbq).
  2. Attempt to append a dataframe with datetime data to that table.

Code example

import datetime

import google.cloud.bigquery
import pandas
import pandas_gbq

destination = "swast-scratch.my_dataset.pandas_datetime_20211222"
bqclient = google.cloud.bigquery.Client()
table = google.cloud.bigquery.Table(
    destination,
    schema=[
        google.cloud.bigquery.SchemaField("row_num", "INTEGER"),
        google.cloud.bigquery.SchemaField("some_datetime", "DATETIME"),
    ]
)
table.expires = (
    datetime.datetime.now(tz=datetime.timezone.utc)
    + datetime.timedelta(days=1)
)
bqclient.create_table(table, exists_ok=True)

df = pandas.DataFrame(
    {
        "row_num": [1, 2, 3],
        "some_datetime": [
            datetime.datetime(2021, 12, 22, 10, 11, 12),
            datetime.datetime(2011, 11, 11, 10, 11, 12),
            datetime.datetime(2000, 1, 2, 3, 4, 5),
        ]
    }
)
pandas_gbq.to_gbq(df, destination, if_exists="append")

Stack trace

(dev-3.9) ➜  scratch python 2021/12-pandas-gbq-datetime/upload_datetime.py
Traceback (most recent call last):
  File "/Users/swast/src/scratch/2021/12-pandas-gbq-datetime/upload_datetime.py", line 33, in <module>
    pandas_gbq.to_gbq(df, destination, if_exists="append")
  File "/Users/swast/src/github.com/googleapis/python-bigquery-pandas/pandas_gbq/gbq.py", line 1129, in to_gbq
    raise InvalidSchema(
pandas_gbq.gbq.InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.
tswast commented 2 years ago

Currently, this failure happens locally due to generated schema != server-side schema. Even without that client-side check, I think this failure could still happen with api_method="load_csv", as it also uses the generated schema if one is not provided.

tswast commented 2 years ago

Actually, there is a workaround. The user can manually specify a schema if they want DATETIME instead of TIMESTAMP.

...
pandas_gbq.to_gbq(
    df,
    destination,
    if_exists="append",
    table_schema=[
        {"name": "row_num", "type": "INTEGER"},
        {"name": "some_datetime", "type": "DATETIME"},
    ]
)

I might change this to a Feature Request, as I believe this is a known way of dealing with the ambiguity between TIMESTAMP and DATETIME.

jlynchMicron commented 10 months ago

+1 for DateTime support instead of just Timestamp. Currently all my DateTime columns are being uploaded as Timestamps with the to_gbq function unless I specify the column is of type 'DATETIME' with the table_schema argument.