z3z1ma / target-bigquery

target-bigquery is a Singer target for BigQuery. It supports storage write, GCS, streaming, and batch load methods. Built with the Meltano SDK.
MIT License
27 stars 34 forks source link

Feature: Create a config option to parse timestamps in special formats. #40

Open jcbmllgn opened 1 year ago

jcbmllgn commented 1 year ago

I have a timestamp coming from a source that causes BigQuery's default timestamp parsing to fail with how this package by default parses timestamps, this line of code generates this sql:

select 
    CAST('2023-04-26 14:57:00 -0700' as TIMESTAMP) as timestamp_without_space

Which fails for this reason:

image

The issue is that there's a space between the seconds and timezone offset, here's what comes from my source system2023-04-26 14:57:00 -0700, however, here's what BigQuery's default CAST(.. as TIMESTAMP) function wants 2023-04-26 14:57:00-0700.


I would like to be able to add a config option to my meltano.yml file for specifying how timestamps should be parsed, something like this:

  - name: target-bigquery
    variant: z3z1ma
    pip_url: git+https://github.com/z3z1ma/target-bigquery.git
    config:
      credentials_json: ${BIGQUERY_CREDENTIALS_JSON}
      project: some-project
      dataset: some-dataset
      timestamp_format: ''%Y-%m-%d %H:%M:%S %z'

and then the package would parse timestamps like this instead:

select
    PARSE_TIMESTAMP('%Y-%m-%d %H:%M:%S %z', some_special_timestamp_column) AS parsed_timestamp
z3z1ma commented 1 year ago

@jcbmllgn I think this is a neat solution with a low scope. The best practice in many taps is to format the timestamps as ISO but because of the sheer variation, it would be nice to have this capability here. I think something we should think about too is using safe_cast

https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_functions#safe_casting

Using this would let us essentially use coalesce to encapsulate fallback logic. Then we can just statically account for a few more date formats outside of which we say use a mapper / fix the dates from the tap.