snowplow / dbt-snowplow-utils

Snowplow utility functions to be used in conjunction with the snowplow-web dbt package.
Other
13 stars 6 forks source link

Add context/sde de-duping macro #123

Closed rlh1994 closed 1 year ago

rlh1994 commented 1 year ago

Description & motivation

As part of the work to get a more flexible approach, and a more consistent approach across warehouses, we would like to allow users to add their own custom contexts to the events this run table for redshift and postgres. We first need to move all our contexts to this table, and currently the new de-dupe logic makes thisa quite a lot of effort and will involve writing many (optional) subqueries.

The macro added in this PR allows for a more consistent and scalable approach in the future, with all columns from a context/sde pulled through in the query, excluding the schema information columns (due to expected name clashes) and the id and timestamp columns which are aliased with the schema name to avoid column name clashes.

The macro will just be available in this verison and later versions of web/mobile can make use of it.

There is an open question of would you ever want the schema columns included, if so they can be prefixed in the same way, I'm not too sure if people would need this? These column names are a little longer is my only worry that we might hit the column name limit for postgres/redshift (59 and 115 respectively I think).

Tests and docs added for the macro.

Checklist