z3z1ma / target-bigquery

target-bigquery is a Singer target for BigQuery. It supports storage write, GCS, streaming, and batch load methods. Built with the Meltano SDK.
MIT License
28 stars 37 forks source link

How to customize `key_properties` for table clustering #104

Open xiangshiyin opened 2 weeks ago

xiangshiyin commented 2 weeks ago

The doc about the config cluster_on_key_properties says

Determines whether to cluster on the key properties from the tap. Defaults to false. When false, clustering will be based on _sdc_batched_at instead.

The code confirms that key_properties is used to define clustering key set and used as the primary key in merge operation

Anyone knows how to determine the default key_properties value and how to customize (is it even possible)? Thanks!

xiangshiyin commented 2 weeks ago

With some further digging, we believe the key_properties here should be the primary keys defined in the stream. With the current clustering key configuration in the sink, the BQ table clustering is directly influenced by the order of columns in the primary key combo defined in the incoming stream. It'll be good if we could have a configurable parameter under the sink so we have more flexibility.

cc. @epapineau