z3z1ma / target-bigquery

target-bigquery is a Singer target for BigQuery. It supports storage write, GCS, streaming, and batch load methods. Built with the Meltano SDK.
MIT License
27 stars 34 forks source link

Fields with dashes (`-`) not accepted #74

Closed rubenvereecken closed 4 months ago

rubenvereecken commented 8 months ago

This is a two-part issue. BigQuery doesn't accept dashes, so I needed to transform field names beforehand (didn't want to change it in the source tap).

First off, the actual error, pointing out that the field first-name isn't allowed because of the dash.

2023-12-30T17:16:26.193844Z [info     ]   File "/Users/ruben/tap-brella/.meltano/loaders/target-bigquery/venv/lib/python3.11/site-packages/target_bigquery/proto_gen.py", line 48, in generate_field_v2 cmd_type=loader name=target-bigquery run_id=93638271-ebad-4046-8cd1-8590f87d3af8 state_id=2023-12-30T171624--tap-brella--target-bigquery stdio=stderr
2023-12-30T17:16:26.193950Z [info     ]     proto_cls = proto_schema_factory_v2(base.fields, pool) cmd_type=loader name=target-bigquery run_id=93638271-ebad-4046-8cd1-8590f87d3af8 state_id=2023-12-30T171624--tap-brella--target-bigquery stdio=stderr
2023-12-30T17:16:26.194054Z [info     ]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cmd_type=loader name=target-bigquery run_id=93638271-ebad-4046-8cd1-8590f87d3af8 state_id=2023-12-30T171624--tap-brella--target-bigquery stdio=stderr
2023-12-30T17:16:26.194156Z [info     ]   File "/Users/ruben/tap-brella/.meltano/loaders/target-bigquery/venv/lib/python3.11/site-packages/target_bigquery/proto_gen.py", line 94, in proto_schema_factory_v2 cmd_type=loader name=target-bigquery run_id=93638271-ebad-4046-8cd1-8590f87d3af8 state_id=2023-12-30T171624--tap-brella--target-bigquery stdio=stderr
2023-12-30T17:16:26.194273Z [info     ]     factory.pool.Add(file_proto) cmd_type=loader name=target-bigquery run_id=93638271-ebad-4046-8cd1-8590f87d3af8 state_id=2023-12-30T171624--tap-brella--target-bigquery stdio=stderr
2023-12-30T17:16:26.194370Z [info     ] TypeError: Couldn't build proto file into descriptor pool: invalid name: non-alphanumeric character (first-name) cmd_type=loader name=target-bigquery run_id=93638271-ebad-4046-8cd1-8590f87d3af8 state_id=2023-12-30T171624--tap-brella--target-bigquery stdio=stderr
  1. I noticed the translator doesn't include turning hyphen case into snake case. Is this intended? Because I'd be happy to submit a PR. Relevant function (I think).
# Column name transforms are configurable and entirely opt-in.
# This allows users to only use the transforms they need and not
# become dependent on inconfigurable transforms outside their
# realm of control which must be recreated if migrating loaders.
@cache
def transform_column_name(
    name: str,
    quote: bool = False,
    lower: bool = False,
    add_underscore_when_invalid: bool = False,
    snake_case: bool = False,
) -> str:
    old_name = name
    """Transform a column name to a valid BigQuery column name."""
    if snake_case and not lower:
        lower = True
    was_quoted = name.startswith("`") and name.endswith("`")
    name = name.strip("`")
    if snake_case:
        name = re.sub("((?!^)(?<!_)[A-Z][a-z]+|(?<=[a-z0-9])[A-Z])", r"_\1", name)
    if lower:
        name = "{}".format(name).lower()
    if add_underscore_when_invalid:
        if name[0].isdigit():
            name = "_{}".format(name)
    if quote or was_quoted:
        name = "`{}`".format(name)
  1. Oddly, the translator doesn't run for my table. I managed to figure that out with some debug logs... but I couldn't quite figure out why it doesn't run for my table (but it does run for the $sdc* fields). Why doesn't it – am I missing a check or config option?

Relevant bits of my meltano.yml:

  - name: target-bigquery
    variant: z3z1ma
    pip_url: git+https://github.com/z3z1ma/target-bigquery.git
    config:
      credentials_path: ${credentials}
      dataset: ${dataset}
      denormalized: true
      fail_fast: true
      flattening_enabled: false
      location: europe-west1
      project: ${project}
      column_name_transforms:
        snake_case: true
        add_underscore_when_invalid: true
z3z1ma commented 8 months ago

Late reply but feel free to add a transformer there if youd like and ill merge it in

rubenvereecken commented 7 months ago

@z3z1ma I'd be happy to. Just couldn't figure out where exactly. Can you point me to the right function?

rubenvereecken commented 4 months ago

For anyone else with this problem I decided to go the meltano way and implemented a general-purpose sanitization mapper that fits between the tap and target: https://github.com/rubenvereecken/map-sanitize