risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.81k stars 565 forks source link

test estimate size #13850

Open st1page opened 9 months ago

st1page commented 9 months ago

Currently, the estimating size is used for metrics but soon, some logic such as backpressure and agg's flush will depends on the estimating size of our structures. So this issue wants to propose a way to test if the estimate size is accurate. The idea is simple, we can create millions of objects, get the memory usage before and after creating them, and then compare the memory changes and the estimate size result.

  1. for those primitive types, it can return the accurate result and the test ensures the format will not be broken silently.
  2. Some structures use more complex logic to maintain the memory size, and the test can ensure the implementations are correct. I guess we have found some issues here such as the agg's dirty state https://github.com/risingwavelabs/risingwave/blob/a8aa905ef26ef8c00f55a7c5b8ce76e6a0f7b72d/src/expr/core/src/window_function/state/aggregate.rs#L39 https://github.com/risingwavelabs/risingwave/issues/13060#issuecomment-1842443584
  3. for HashMap or BtreeMap, we have a way to quantify and measure there amplification

Furthermore, maybe fuzz test is good for the situation.

fuyufjh commented 8 months ago

Please discuss details with @st1page

lmatz commented 6 months ago

Is this a pre-requisite for estimated size based memory eviction mechanism?

yuhao-su commented 6 months ago

Is this a pre-requisite for estimated size based memory eviction mechanism?

Yes

github-actions[bot] commented 3 months ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.