snowplow / dbt-snowplow-utils

Snowplow utility functions to be used in conjunction with the snowplow-web dbt package.
Other
12 stars 6 forks source link

Add condition to filter out late loading events #145

Closed rlh1994 closed 12 months ago

rlh1994 commented 12 months ago

Description & motivation

_It is possible, although rare, that when not using load_tstamp as the session_timestamp and the event has a long delay (greater than the lookback window) before loading, and this was the first event of a session, that the start_tstamp of that session may actually be greater than the true start_tstamp of the session (based on that late loading event). In this situation, because we currently only filter events_this_run on the sessionid and min start overall (not per session) it is possible this event would be included on some runs and excluded on later runs.

_To make sure this event is deterministically excluded from events_this_run, I have added a filter to ensure the events for that session are at least greater than or equal to the start of that session as we have it in the lifecycle table._

_If the user uses load_tstamp for session_timestamp this is not relevant and the issue would not occur anyway._

The above was the original plan for this PR, it now also:

Checklist