The BigQuery.copy() method does not seem to work for a variety of situations, fixes are made here as I encounter these issues and resolve them.
Fixed BigQuery type map
Source types ultimately come from petl.typeset, which calls
type(v).__name__. This call does not include source module, but only
the type name itself. e.g. date and not datetime.date
Prefer not NoneType when inferring schema for Table load to BigQuery
If a Parsons Table column has values like [None, None, True, False],
the BigQuery connector will infer that the appropriate type for this
column is NoneType, which it will translate into a STRING type.
This change ensures that types returned by petl.typecheck() will
choose the first available type that isn't 'NoneType' if that is
available.
Fix commented out row to use job_config passed as argument to BigQuery.copy()
It looks like this line was accidentally commented out
Parse python datetime objects for BigQuery as datetime or timestamp
Python datetime objects may represent timestamps or datetimes in
BigQuery, depending on whether they do or do not have a timezone
attached.
Before this change, a parsons Table that included datetimes with
timestamps would fail to load to BigQuery because BigQuery
would reject datetime strings with timezone information as the
"datetime" data type.
Only generate schema for BigQuery when table does not already exist
Always passing a schema to BigQuery is not necessary, and introduces
situations for provided schema to mismatch actual schema.
When table already exists in BigQuery, fetch the schema from BigQuery
The BigQuery.copy() method does not seem to work for a variety of situations, fixes are made here as I encounter these issues and resolve them.
Fixed BigQuery type map
Source types ultimately come from
petl.typeset
, which callstype(v).__name__
. This call does not include source module, but only the type name itself. e.g.date
and notdatetime.date
Prefer not NoneType when inferring schema for Table load to BigQuery
If a Parsons Table column has values like
[None, None, True, False]
, the BigQuery connector will infer that the appropriate type for this column is NoneType, which it will translate into a STRING type.This change ensures that types returned by petl.typecheck() will choose the first available type that isn't 'NoneType' if that is available.
Fix commented out row to use job_config passed as argument to BigQuery.copy()
It looks like this line was accidentally commented out
Parse python datetime objects for BigQuery as datetime or timestamp
Python datetime objects may represent timestamps or datetimes in BigQuery, depending on whether they do or do not have a timezone attached.
Before this change, a parsons Table that included datetimes with timestamps would fail to load to BigQuery because BigQuery would reject datetime strings with timezone information as the "datetime" data type.
Only generate schema for BigQuery when table does not already exist
Always passing a schema to BigQuery is not necessary, and introduces situations for provided schema to mismatch actual schema.
When table already exists in BigQuery, fetch the schema from BigQuery