perf(streaming): batching input for each group in the StreamHashAgg

st1page commented 1 year ago

Currently, the StreamHashAgg constructs Mask on the visibility bitmap to process different groups' data. https://github.com/risingwavelabs/risingwave/blob/8052dec77bc71a0a5895553dcb3022a494d915ad/src/stream/src/executor/hash_agg.rs#L423 This will hurt the performance especially when there is a large number of distinct groups in a chunk. It makes our vectorized processing regressed though we implement the apply_batch interface on the aggregators. e.g. if the chunk size is 1024 and there are 128 distinct keys in the group, we still need to call the virtual function(Box<dyn>) 128 times. Also in some previous analyzes, the aggregator can works better with the compacted chunk https://github.com/risingwavelabs/risingwave/issues/9148 https://github.com/risingwavelabs/risingwave/issues/9150. We can delay the apply_batch until get_output() be called. In other word, we can batching the same group's data between chunks in the same epoch. Pros:

optimize the aggregator's calculation of apply_batch which can accept chunks with higher selectivity and reduce the call number of apply_batch .

Cons:

one more copy for all the data because we need to reconstruct the chunks for each group.
it might consume more memory and upto chunk_size times because of the batching.

lmatz commented 1 year ago

Seems that we can construct a workload that has no same key to infer the maximum potential improvement in the micro-benchmark? And another workload that has all same key to infer the maximum potential regression 🤔

st1page commented 1 year ago

Seems that we can construct a workload that has no same key to infer the maximum potential improvement in the micro-benchmark?

I think that could be very strict for RisingWave :hot_face: if the number of the distinct group is very high, our LRU and other cache miss rate could be high too,

st1page commented 1 year ago

Before this we must do https://github.com/risingwavelabs/risingwave/issues/9301 and make sure rebuilding the chunk does not take too much overhead

github-actions[bot] commented 4 months ago

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄

risingwavelabs / risingwave

perf(streaming): batching input for each group in the StreamHashAgg #9300