tnightengale / dbt-activity-schema

A dbt-Core package for generating models from an activity stream.

GNU General Public License v3.0

38 stars 5 forks source link

Add Additional Aggregation Functions #29

Open bcodell opened 1 year ago

bcodell commented 1 year ago

Description

It's technically outside of the Activity Schema spec, but it would be nice if the package shipped with additional aggregation functions to use with the aggregate relationships. Aggregations include:

average
median
listagg (available in Narrator)
listagg distinct
count distinct
boolean sum - convert boolean features to integers and sum them
not null - returns true if the feature has at least one non-null value, else false. Useful for sparse features.

Dependencies

25 - dbt project needs to know data types for features
28 - new aggregations need to be registered in the Aggregation Registry

Implementation

Add a macro for each aggregation with the naming convention _aggfunc_name.sql (e.g. _average.sql)
Implement each using the caller() implementation pattern (see example)
Register the aggregation in the Aggregation Registry

Checklist for each of the aggregations to implement:

[ ] average
[ ] median
[ ] listagg
[ ] listagg distinct
[ ] count distinct
[ ] boolean sum
[ ] not null

Open Questions

Are these reasonable to implement, even though they aren't included in the Activity Schema spec?
Do the aggregations specified make sense?
What other aggregations should be added?