snowplow / dbt-snowplow-utils

Snowplow utility functions to be used in conjunction with the snowplow-web dbt package.
Other
12 stars 6 forks source link

Add support for new materialization to enable real-time modeling #136

Open leonard-henriquez opened 1 year ago

leonard-henriquez commented 1 year ago

Is your feature request related to a problem? Please describe.

In the current state of DBT Snowplow, if you want to get recent events, you need to run dbt run to process new data. This package offers the "incremental" materialization option to process only new events and not every event with each run. However, this approach still makes it challenging to have fresh data with low latency (<1 minute).

For instance, let's take an example:

So, my data is only available at 08:47 am. There are delays that are very hard to compress because we can't realistically run DBT jobs every second, and the DBT job takes a few minutes to complete.

Describe the solution you'd like

We could take advantage of the "lambda view" pattern and introduce a new materialization option that would benefit from materialized views and dynamic tables (for Snowflake).

Describe alternatives you've considered

Running DBT more frequently, but it's costly.

Are you interested in contributing towards this feature?

I am willing to help, but I am a newbie in DBT. I've tried to modify the materialization but didn't succeed in making it work. However, I've found interesting resources that can help:

miike commented 1 year ago

For anyone landing here - here is a thread on the issue: https://discourse.snowplow.io/t/data-modeling-in-real-time/8978

This is something we will likely look into but it's not currently on the immediate road map.