Aggregations currently produce two records whenever an aggregated value changes: a negative for the old value, and a positive for the new one. This causes unfortunate extra computation downstream in the data-flow. For example, a following join, while it must still only do one lookup for the join key for those two rows, it has to allocate and produce two rows instead of one, which is costly.
We should probably introduce a new Record variant along the lines of
The values in diff, when they are Some, indicate that the given value should be added to the current value (in row; the old negative) to produce the (new) positive. This would allow the aggregation to produce a single Delta row for each output, and the join to produce a single Delta row for each Delta row, etc. Since diff is a Vec, the values could even pass through multiple operators that update diffs without turning into multiple rows (does this happen in reality though? maybe (usize, DataType) is sufficient?).
Aggregations currently produce two records whenever an aggregated value changes: a negative for the old value, and a positive for the new one. This causes unfortunate extra computation downstream in the data-flow. For example, a following
join
, while it must still only do one lookup for the join key for those two rows, it has to allocate and produce two rows instead of one, which is costly.We should probably introduce a new
Record
variant along the lines ofThe values in
diff
, when they areSome
, indicate that the given value should be added to the current value (inrow
; the old negative) to produce the (new) positive. This would allow the aggregation to produce a singleDelta
row for each output, and the join to produce a singleDelta
row for eachDelta
row, etc. Sincediff
is aVec
, the values could even pass through multiple operators that update diffs without turning into multiple rows (does this happen in reality though? maybe(usize, DataType)
is sufficient?).