Open ddebrunner opened 8 years ago
Doesn't feel like it's worth the addition... though maybe that's just because I haven't had to live with it as it is :-)
I imagine some uses would need to know that nothing was captured during the last N sec window and react as appropriate for the app (e.g., log, publish, generate a special tuple).
Seems clear/natural to expect the batcher
is called every N sec with whatever tuples arrived during that time, hence the list may be empty.
Seems simple enough for uses that don't care about that case, if it doesn't fallout naturally from their processing, to just make the check. e.g.,
TStream s = s.last(10, SECONDS).batch(
tuples -> { if (tuples.isEmpty()) return null;
... process the batch
};
In a common case where "process the batch" involves a math3 analytic (e.g., SUM), maybe we define those analytics to return null if the input list is empty thereby eliminating having to write an explicit check?
I somewhat agree with @dlaboss , but while it's simple to have the isEmpty() it doesn't help reduce cpu cost of potentially many schedules of a task that ends up doing nothing.
Maybe it's a YAGNI, and maybe it's more than just a boolean, but potentially a mode, e.g.:
Maybe also it's a property of the window, so it applies to any window operation. If it's empty then do nothing for any window operation. This might also be easier to add in the future without breaking compatibility or having to have multiple overloads of batch()
.
I agree that implementing this as an empty partition policy is easy to add without breaking compatibility, but I think of handling empty windows as an attribute of the aggregate function rather than of the window.
Assume I want to support multiple aggregate functions triggered by the same window. To do that without extending Quarks, I have to create identical windows for each aggregate function. If this becomes unacceptable (e.g. high memory consumption if windows are large), and the alternative is to update the current window implementation to support a list of aggregates per window, then a per-window policy would then apply to all the aggregates.
@vdogaru ?> To do that without extending Quarks, I have to create identical windows for each aggregate function.
Are you saying that is the case now, or if the empty partition policy was per window?
I am saying that adding an empty partition policy per window would force all the aggregates triggered by that window to share the window's policy.
Right, but that doesn't force each aggregation to have its own window. Only ones with different policies.
that doesn't force each aggregation to have its own window
Currently, each time TWindow.aggregate is called, it creates a new Aggregate<T,U,K> oplet which contains its own Window. So in that sense, each aggregation does have its own window.
@wmarshall484 A mere implementation detail, from the api they use a single window.
The implementation could be changed to have a single oplet for a window.
Another situation that may not matter for Quarks being on the edge, but when there are thousands of partitions processing empty partitions can add up.
The current batch implementation will process empty windows when the window is the last N seconds.
I wonder if this is the desired or expected behaviour. I see batching as breaking a stream into batches of tuples, thus processing an empty one doesn't seem to make much sense.
Batch could have an option to allow processing of empty windows, defaulting to false, or just not support it.
Throughts?