move-coop / parsons

A python library of connectors for the progressive community.
https://www.parsonsproject.org/
Other
260 stars 132 forks source link

Miscellaneous fixes to BigQuery connector #959

Closed austinweisgrau closed 8 months ago

austinweisgrau commented 9 months ago

The BigQuery.copy() method does not seem to work for a variety of situations, fixes are made here as I encounter these issues and resolve them.

Fixed BigQuery type map

Source types ultimately come from petl.typeset, which calls type(v).__name__. This call does not include source module, but only the type name itself. e.g. date and not datetime.date

Prefer not NoneType when inferring schema for Table load to BigQuery

If a Parsons Table column has values like [None, None, True, False], the BigQuery connector will infer that the appropriate type for this column is NoneType, which it will translate into a STRING type.

This change ensures that types returned by petl.typecheck() will choose the first available type that isn't 'NoneType' if that is available.

Fix commented out row to use job_config passed as argument to BigQuery.copy()

It looks like this line was accidentally commented out

Parse python datetime objects for BigQuery as datetime or timestamp

Python datetime objects may represent timestamps or datetimes in BigQuery, depending on whether they do or do not have a timezone attached.

Before this change, a parsons Table that included datetimes with timestamps would fail to load to BigQuery because BigQuery would reject datetime strings with timezone information as the "datetime" data type.

Only generate schema for BigQuery when table does not already exist

Always passing a schema to BigQuery is not necessary, and introduces situations for provided schema to mismatch actual schema.

When table already exists in BigQuery, fetch the schema from BigQuery

austinweisgrau commented 8 months ago

FYI all these force pushes are rebasing on top of main when there are new commits merged into main