timeplus-io / proton

A stream processing engine and database, and a fast and lightweight alternative to ksqlDB and Apache Flink, 🚀 powered by ClickHouse
https://timeplus.com
Apache License 2.0
1.57k stars 69 forks source link

random_storages_rate_limitor is too off #71

Closed jovezhong closed 1 year ago

jovezhong commented 1 year ago

Describe what's wrong

I tried to set this value as 1, I should get 10 events per second, but get much more

CREATE RANDOM STREAM rand_stream(i int default rand()%5) SETTINGS random_storages_rate_limitor=1;
select count(), window_start from tumble(rand_stream, 1s) group by window_start;
┌─count()─┬────────────window_start─┐
│      32 │ 2023-09-11 16:24:35.000 │
└─────────┴─────────────────────────┘
┌─count()─┬────────────window_start─┐
│      80 │ 2023-09-11 16:24:36.000 │
└─────────┴─────────────────────────┘
┌─count()─┬────────────window_start─┐
│      80 │ 2023-09-11 16:24:37.000 │
└─────────┴─────────────────────────┘
┌─count()─┬────────────window_start─┐
│      80 │ 2023-09-11 16:24:38.000 │
└─────────┴─────────────────────────┘

I have mentioned in the doc saying the number won't be too accurate, but it should 5-15 with 10 as baseline. But 80 is too off.

How to reproduce

Error message and/or stacktrace

Additional context

chenziliang commented 1 year ago

I think we just need a eps (event per second) setting. Probably we can just rename random_storages_rate_limitor to eps

CREATE RANDOM STREAM rand_stream(i int default rand()%5) SETTINGS eps=1000;

If eps is too high, we may need consider multi-shard / threads generation but this is low priority and in most of cases, random stream itself is super fast

jovezhong commented 1 year ago

yes, eps sounds great to me. There is a ticket #72