meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
94 stars 68 forks source link

bug: `table-key-properties` in `metadata` extra does not populate schema `key_properties` for `tap-bigquery` #2660

Open ReubenFrankel opened 1 week ago

ReubenFrankel commented 1 week ago

Singer SDK Version

0.39.1

Is this a regression?

Python Version

3.9

Bug scope

Taps (catalog, state, etc.)

Operating System

Ubuntu 22.04 LTS

Description

I've been trying to use table-key-properties to configure key properties for a BigQuery stream:

plugins:
  extractors:
  - name: tap-bigquery
    variant: meltanolabs
    pip_url: git+https://github.com/Matatika/tap-bigquery.git
    metadata:
      '*-events_*':
        table-key-properties:
        - event_timestamp
        - event_name
        - event_bundle_sequence_id

In this case, key_properties is empty so when loading to a database target no primary keys are configured:

Generated catalog ```json { "streams": [ { "tap_stream_id": "analytics_376142012-events_20240825", "table_name": "events_20240825", "replication_method": "", "key_properties": [], "schema": ..., "is_view": false, "stream": "analytics_376142012-events_20240825", "metadata": [ ..., { "breadcrumb": [], "metadata": { "inclusion": "available", "table-key-properties": [ "event_timestamp", "event_name", "event_bundle_sequence_id" ], "forced-replication-method": "", "schema-name": "analytics_376142012", "selected": true } }, ... ], "selected": true, "table_key_properties": [ "event_timestamp", "event_name", "event_bundle_sequence_id" ] } ] } ```

Primary keys are configured when using key-properties/key_properties (not documented here):

    metadata:
      '*-events_*':
        key-properties:
        - event_timestamp
        - event_name
        - event_bundle_sequence_id
Generated catalog ```json { "streams": [ { "tap_stream_id": "analytics_376142012-events_20240825", "table_name": "events_20240825", "replication_method": "", "key_properties": [ "event_timestamp", "event_name", "event_bundle_sequence_id" ], "schema": ..., "is_view": false, "stream": "analytics_376142012-events_20240825", "metadata": [ ..., { "breadcrumb": [], "metadata": { "inclusion": "available", "table-key-properties": [], "forced-replication-method": "", "schema-name": "analytics_376142012", "selected": true, "key-properties": [ "event_timestamp", "event_name", "event_bundle_sequence_id" ] } }, ... ], "selected": true } ] } ```
    metadata:
      '*-events_*':
        key_properties:
        - event_timestamp
        - event_name
        - event_bundle_sequence_id
Generated catalog ```json { "streams": [ { "tap_stream_id": "analytics_376142012-events_20240825", "table_name": "events_20240825", "replication_method": "", "key_properties": [ "event_timestamp", "event_name", "event_bundle_sequence_id" ], "schema": ..., "is_view": false, "stream": "analytics_376142012-events_20240825", "metadata": [ ..., { "breadcrumb": [], "metadata": { "inclusion": "available", "table-key-properties": [], "forced-replication-method": "", "schema-name": "analytics_376142012", "selected": true, "key_properties": [ "event_timestamp", "event_name", "event_bundle_sequence_id" ] } }, ... ], "selected": true } ] } ```

Why does this work? What is the correct way to do this? I assume table-key-properties is the intended approach, so I don't know if I'm going to run into some undefined behaviour down the line.


Related:

Code

No response

Link to Slack/Linen

https://meltano.slack.com/archives/C069CQNHDNF/p1725976166433339

ReubenFrankel commented 1 week ago

Worth noting that you can also set key properties via stream maps:

    config:
      stream_maps:
        '*-events_*':
          __key_properties__:
          - event_timestamp
          - event_name
          - event_bundle_sequence_id