snowplow / dbt-snowplow-utils

Snowplow utility functions to be used in conjunction with the snowplow-web dbt package.
Other
13 stars 6 forks source link

get_sde_or_context: Doesn't have a database option #164

Closed jethron closed 8 months ago

jethron commented 8 months ago

Describe the bug

In the Snowplow data models, you can configure snowplow__database to specify a database that differs from your target. If you/the models use the Postgres/Redshift get_sde_or_context macro, that database isn't passed through and respected, and the Relation created assumes the target database instead.

Steps to reproduce

Expected results

Macros adhere to the configuration and access tables from the configured other database

Actual results

Screenshots and log output

Which database are you using dbt with?

The output of dbt --version:

1.7.0

The operating system you're using: Windows

The output of python --version: 3.x

Are you interested in contributing towards the fix?

rlh1994 commented 8 months ago

Thanks @jethron, this should be easy enough to enable in the macro itself (just adding a database option and passing it to the relation call), it'll take a while to propagate through to our other packages though as we'll have to make an update and release for all calls of it.

How common do you think it is to have your atomic data in one database and want to write your models into another?

jethron commented 8 months ago

it'll take a while to propagate through to our other packages though as we'll have to make an update and release for all calls of it.

Yeah, hence not just making a PR. :sweat_smile:

How common do you think it is to have your atomic data in one database and want to write your models into another?

It's taken this long to be an issue for a customer, so probably not very common? This customer is using them as logical separation, but I think that's because historically they had a dev pipeline going to this second database and the *_dev name of it made it logical to use for dbt development, too. In prod it will likely end up that snowplow__database will === the target database.

rlh1994 commented 8 months ago

Thanks, I've got a ticket next sprint hopefully to add this - should at least be able to do utils and unified and then just roll the others out over a longer period.

If it is critical, you could always overwrite the macro with your own version in the meantime that considers that variable if it exists. Should pretty much be a flat copy and just swap out this line

https://github.com/snowplow/dbt-snowplow-utils/blob/9e63d0f38ea5c21fa3d2ef3073dd27f0bb887d5a/macros/utils/get_sde_or_context.sql#L20

{% set relation = api.Relation.create(database = var('snowplow__database', target.database), schema = schema, identifier = identifier) %}
rlh1994 commented 8 months ago

I actually forgot that this is now mostly handled directly in the base macors now, so except for models where we explicitly use this (I think we do in ecom, maybe nothing else?) this change would actually be immediately available in all packages.