risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.03k stars 578 forks source link

Support different random distributions in `datagen` #5415

Open lmatz opened 2 years ago

lmatz commented 2 years ago

Is your feature request related to a problem? Please describe.

Simulate data skewness

Describe the solution you'd like

Use something off-the-shelf

Power law distribution. Also single-hot-key scenario.

Describe alternatives you've considered

May need to manually implement some special cases.

Additional context

One motivation is that if only a small portion of the state is accessed frequently, RW's tiered storage should save a lot of costs, compared to a storage system that must have all of its data on its local disk. We may want to show the advantages.

Another one is to see how the system behave when different actors receive different amount of input.

github-actions[bot] commented 1 year ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.