Closed agnessnowplow closed 2 months ago
Putting this back into draft as it could add potentially significant cost over time if we keep this as a default. Let's investigate alternative approaches, perhaps handling this as and when it arrives (due to its rarity) would be the better option, reprocessing events with a larger lookback window should in theory unblock users (e.g. as a one-off run with 3 days worth of lookback window (snowplow__lookback_window_hours))
Description
BigQuery users that have partitioning on
derived_tstamp
(snowplow__derived_tstamp_partitioned: true
) need additional filtering buffer on the lower_limit when creating thebase_events_this_run table
when relying on the derived_timestamp as in case events are sent late (e.g dvce_created and sent tstamp differs more significantly), it can happen that the minimum and maximum limits in a certain run prevent some of the earlier sent events in a session to be reprocessed as a whole in a later run like it should causing all sorts of data issues.What type of PR is this? (check all applicable)
Related Tickets & Documents
Checklist
Added tests?
Added to documentation?
[optional] Are there any post-deployment tasks we need to perform?
[optional] What gif best describes this PR or how it makes you feel?