mit-pdos / noria

Fast web applications through dynamic, partially-stateful dataflow
Apache License 2.0
4.98k stars 242 forks source link

Send deltas instead of negative + positives when single column changes #102

Open jonhoo opened 5 years ago

jonhoo commented 5 years ago

Aggregations currently produce two records whenever an aggregated value changes: a negative for the old value, and a positive for the new one. This causes unfortunate extra computation downstream in the data-flow. For example, a following join, while it must still only do one lookup for the join key for those two rows, it has to allocate and produce two rows instead of one, which is costly.

We should probably introduce a new Record variant along the lines of

struct Delta {
    row: Vec<DataType>,
    diff: Vec<Option<DataType>>,
}

The values in diff, when they are Some, indicate that the given value should be added to the current value (in row; the old negative) to produce the (new) positive. This would allow the aggregation to produce a single Delta row for each output, and the join to produce a single Delta row for each Delta row, etc. Since diff is a Vec, the values could even pass through multiple operators that update diffs without turning into multiple rows (does this happen in reality though? maybe (usize, DataType) is sufficient?).