Open jlynchMicron opened 4 months ago
I have also tried uploading the data as dtype STRING in hopes that it would be converted to JSON server-side, but that also results in the following error: google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/gdw-dev-sse/jobs?uploadType=resumable: Provided Schema does not match Table XXXXXXXXXXXXXXXX. Field summery_data has changed type from JSON to STRING
NOTE: Hiding table name
I was able to upload my data with the following code, but JSON support should be added for pandas dataframes.
Code:
data['summery_datetime'] = data['summery_datetime'].astype(str) #Datetime objects are not JSON serializable, convert to datetime string.
data = data.to_dict(orient='records')
job = client.load_table_from_json(data, table_obj, job_config=job_config)
print(job.result())
Thanks @jlynchMicron for providing a workaround. I think there are a few problems we'll need to work through, one of which is that the bigquery backend doesn't support JSON in load jobs from Parquet files: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet#type_conversions Please file a Feature Request https://cloud.google.com/support/docs/issue-trackers (specifically, Create new BigQuery issue).
Another possible workaround is to use CSV as the source format. In your example
schema = [
bigquery.SchemaField("system_id", "STRING"),
bigquery.SchemaField("summery_type", "STRING"),
bigquery.SchemaField("summery_datetime", "DATETIME"),
bigquery.SchemaField("summery_is_good", "BOOL"),
bigquery.SchemaField("summery_data", "JSON"), #Stored in dataframe as JSON formatted string
]
job_config = bigquery.LoadJobConfig(
schema=schema,
source_format="CSV",
)
job = client.load_table_from_dataframe(data, table_obj, job_config=job_config)
print(job.result())
For googlers watching this issue, I have proposed a design go/bf-json which proposes a JSONDtype in https://github.com/googleapis/python-db-dtypes-pandas which would allow us to autodetect when JSON is used in a DataFrame. Before that though, we could do the same and choose the appropriate serialization format depending on the provided schema. For example, parquet must be used with STRUCT/ARRAY columns, but CSV must be used for JSON.
https://github.com/googleapis/python-db-dtypes-pandas/pull/284 can potentially fulfill this feature request.
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Is your feature request related to a problem? Please describe. python-bigquery does not seem to currently support uploading dataframes where one of the columns in the destination table is JSON dtype.
Quick partial code example:
Result:
google.api_core.exceptions.BadRequest: 400 Unsupported field type: JSON; reason: invalid, message: Unsupported field type: JSON
Describe the solution you'd like Implement support for loading data to BigQuery that contain JSON columns.
Additional context
Related issues: https://github.com/googleapis/python-bigquery-sqlalchemy/issues/399 https://github.com/googleapis/python-bigquery-pandas/issues/698 https://github.com/googleapis/python-bigquery-dataframes/issues/816