mjirv / dbt-datamocktool

A dbt package for unit testing your SQL analytics models
Apache License 2.0
162 stars 20 forks source link

Allow for overriding of current_timestamp to a static value #48

Open mkkuba opened 2 years ago

mkkuba commented 2 years ago

Describe the feature

Allow for overriding of {{ dbt_utils.current_timestamp() }} to a static value

Describe alternatives you've considered

I have not been able to think of straight forward alternatives to easily do this.

This could also be useful for overriding of dbt functions in general (e.g. {{ var('my_variable') }})

Who will this benefit?

This is useful for testing dbt transformations that are dependent on the current timestamp

Are you interested in contributing this feature?

No

mjirv commented 2 years ago

@mkkuba thanks for submitting this. Good idea!

The issue you're facing is that you have a column using current_timestamp that you can't mock because it changes on every run, right?

Here's one idea: have you tried using the compare_columns property? There's documentation near the bottom of the README. It might get messy if you have a ton of columns though.

mkkuba commented 2 years ago

@mkkuba thanks for submitting this. Good idea!

The issue you're facing is that you have a column using current_timestamp that you can't mock because it changes on every run, right?

Here's one idea: have you tried using the compare_columns property? There's documentation near the bottom of the README. It might get messy if you have a ton of columns though.

@mjirv thank you for the quick response. The issue is that our transform itself uses the value of current_timestamp (and not just a column that we can ignore). In our particular case it's something like

FROM dim_users
CROSS JOIN dim_dates
WHERE dim_dates.day <= current_timestamp

(joining a date dimension table up to a certain timestamp)

Another potential use case might be having an event stream but only wanting the last hour of events (and the test would confirm you weren't selecting more than the past hour of events)

mjirv commented 2 years ago

Got it, that makes sense. We can definitely work on this. Thanks!

yauhen-sobaleu commented 2 years ago

You can also substitute your non-deterministic function calls with hardcoded values that you will use in your testing environment (for example, some kind of CI env)

WITH

dates AS (

    SELECT dd AS group_date
    {%- if target.name == 'ci' %} -- used in unit testing
    FROM GENERATE_SERIES('2021-02-13'::DATE, '2021-02-13'::DATE, INTERVAL '1 days') AS dd
    {%- else %} -- used in production
    FROM GENERATE_SERIES(CURRENT_DATE - 1, CURRENT_DATE - 1, INTERVAL '1 days') AS dd
    {%- endif %}

),