microsoft / Trill

Trill is a single-node query processor for temporal or streaming data.
MIT License
1.25k stars 132 forks source link

Aggregate Streams Over 24 Hours #126

Open JonathanKeav opened 4 years ago

JonathanKeav commented 4 years ago

Hi, I am hoping to use Trill for some analysis of temporal data. What I need is an aggregate by hour of the last 24 hours of data. The data arrives in batch of events every 15 mins. ~95% of these events will have a start time somewhere between the current time to 15 mins ago. ~5% will be older than 15 mins. The order will be relatively good (increasing in time) but some disorder. The application is to produce a running picture (reports and dashboards) of the last 24 hours that is updated after every batch arrives which is every 15 mins. So I could run the entire batch through a stream and call punctuate at the end of a batch but each batch will have ~5% of events that belong to the previous 15 mins. Occasionally you could get events lagging by 30 or 45 mins. This is rare but can happen and I need to capture these late events in aggregations for reporting.

Is it possible to achieve the above scenario with Trill?

peterfreiling commented 4 years ago

Not sure if this would be acceptable, but you could set a DisorderPolicy of Adjust, which will modify the ~5% of disordered events that arrive after their time window has lapsed to the current time window. Or, you could drop them altogether (DisorderPolicy.Drop). Unfortunately, Trill does not currently support out-of-order processing, i.e. emitting one result, then correcting that result later when more data arrives for that already-lapsed time window.