A fully incremental model, that transforms raw web & mobile event data generated by the Snowplow JavaScript & mobile trackers into a series of derived tables of varying levels of aggregation.
Other
13
stars
6
forks
source link
Add aggregation fields to views, sessions, users #23
The next step for the most common request and use case for custom models is aggregations of (base) events this run records for that view/session. This adds the same risk we did with passthrough fields, which it is easy to make mistakes with this from users and we have no control over the SQL they write to ensure it's correct and error free. This approach at least ensures that we only allow certain aggregation types and the grouping id which means we can be sure there aren't any 1:many joins. Users, as always, has to re-aggregate from sessions the same as passthrough fields.
I am still of the opinion that this isn't that much easier than just adding a custom model and joining to e.g. views this run, especially as to do anything like count events of a specific type you have to write a case when statement, but if it helps a few people it isn't much work to keep in the package.
Example var:
snowplow__view_aggregations: [{"type": "sum", "field": "case when event_name = 'button_click' then 1 else 0 end", "alias": "num_ button_clicks" ]
Description
The next step for the most common request and use case for custom models is aggregations of (base) events this run records for that view/session. This adds the same risk we did with passthrough fields, which it is easy to make mistakes with this from users and we have no control over the SQL they write to ensure it's correct and error free. This approach at least ensures that we only allow certain aggregation types and the grouping id which means we can be sure there aren't any 1:many joins. Users, as always, has to re-aggregate from sessions the same as passthrough fields.
I am still of the opinion that this isn't that much easier than just adding a custom model and joining to e.g. views this run, especially as to do anything like count events of a specific type you have to write a
case when
statement, but if it helps a few people it isn't much work to keep in the package.Example var:
snowplow__view_aggregations: [{"type": "sum", "field": "case when event_name = 'button_click' then 1 else 0 end", "alias": "num_ button_clicks" ]
What type of PR is this? (check all applicable)
Related Tickets & Documents
https://snplow.atlassian.net/browse/PE-5992
Checklist
Added tests?
Added to documentation?
[optional] Are there any post-deployment tasks we need to perform?
[optional] What gif best describes this PR or how it makes you feel?