twitter / algebird

Abstract Algebra for Scala
https://twitter.github.io/algebird
Apache License 2.0
2.29k stars 345 forks source link

A monoid for sliding time window aggregations #626

Open dasch opened 7 years ago

dasch commented 7 years ago

I've started using Algebird in conjunction with Google Dataflow, which works great! My main data structure is a deeply nested case class that is implicitly a monoid – it contains a bunch of stuff, mainly time-sorted Max values (such that I only keep the most recent version of some value).

My needs are a bit specific, though, and the built-in time window aggregation semantics are causing some complications. I'd like to ask if you have any experience implementing windowed aggregations using monoids, or whether you'd be interested in accepting such an implementation were I to write one.

My current idea is loosely based on TopK – let's just call it SlidingWindow. It keeps k "slots", each containing an instance of another monoid and a timestamp (the timestamp will be normalized according to the period of the time window, e.g. start-of-day). When adding two of these together, each slot that maps to the same timestamp will be added together, but only the k most recent slots will be retained. There would be a method that returned the "sum" of all the slots, representing, say, the total count of some event during the last k days/minutes/whatever.

Does this design make sense? I've tried looking, but I have seen little mention of time as a factor here, although it's critical in many streaming applications.

Sorry for the bother if this is the wrong place to ask.

SemanticBeeng commented 5 years ago

@johnynek mentions something related here "scale.bythebay.io: Oscar Boykin, How to Elm-ify Your ML" https://youtu.be/k1jhDcruiLE?t=2142 but would like to see some code. :-)

sritchie commented 5 years ago

We've got a pretty nice approximate sliding window implementation here: http://twitter.github.io/algebird/datatypes/approx/exponential_histogram.html

This might help!

sritchie commented 5 years ago

cc @SemanticBeeng