adds custom query logic in the dataset macro to more efficiently compute aggregations for aggregate_all_ever relationship. It does so by bypassing the join to the primary activity, which is feasible since the aggregated columns have no time dependency on the primary activity. It instead aggregates around the customer column for the joined activity, and then joins back to the primary activity CTE's customer column in the final CTE that joins all the CTEs together.
updates docs to warn developers against specifying additional join logic that uses the {{ primary() }} macro in aggregate_all_ever activities
Generated sql for dataset__aggregate_all_ever_1:
with filter_activity_stream_using_primary_activity as (
select
stream.activity_id as activity_id,
stream.entity_uuid as entity_uuid,
stream.ts as ts,
stream.revenue_impact as revenue_impact,
stream.activity as activity,
stream.activity_occurrence as activity_occurrence,
stream.activity_repeated_at as activity_repeated_at
from "dbt"."main"."input__aggregate_all_ever" as stream
where stream.activity = 'signed up'
and (true)
),
append_and_aggregate__1__aggregate_all_ever
as (
select
-- special case for aggregate_all_ever relationship to avoid exploding join
entity_uuid,
count(appended.activity_id) as aggregate_all_ever_visit_page_activity_id
from "dbt"."main"."input__aggregate_all_ever" as appended
where appended.activity = 'visit page'
group by entity_uuid
),
rejoin_aggregated_activities as (
select
stream.activity_id,
stream.entity_uuid,
stream.ts,
stream.revenue_impact,
append_and_aggregate__1__aggregate_all_ever.aggregate_all_ever_visit_page_activity_id
from filter_activity_stream_using_primary_activity as stream
left join append_and_aggregate__1__aggregate_all_ever
on append_and_aggregate__1__aggregate_all_ever.entity_uuid = stream.entity_uuid
)
select * from rejoin_aggregated_activities
Resolves #36
This PR:
dataset
macro to more efficiently compute aggregations foraggregate_all_ever
relationship. It does so by bypassing the join to theprimary
activity, which is feasible since the aggregated columns have no time dependency on the primary activity. It instead aggregates around thecustomer
column for the joined activity, and then joins back to the primary activity CTE'scustomer
column in the final CTE that joins all the CTEs together.{{ primary() }}
macro inaggregate_all_ever
activitiesGenerated sql for
dataset__aggregate_all_ever_1
: