risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://www.risingwave.com/slack
Apache License 2.0
6.74k stars 556 forks source link

perf: performance issue when there is only barrier passing the cluster #17646

Open st1page opened 1 month ago

st1page commented 1 month ago

In some real-world scenarios, we have encountered this situation.

In this situation, the only flow in the graph is the barrier. In our assumption, when there is no data between barriers, the barriers should flow through quickly. Otherwise, if the time for barriers to pass through the entire graph exceeds one second, then barriers will continue to accumulate, and the cluster will always be in a state of backpressure. more infos here https://www.notion.so/risingwave-labs/CVTE-2024-07-10-barrier-only-ecc23aa5b9ee4664a97a17c97a25d709?pvs=4

Below are some suspected causes and potential optimizations we are currently considering for this situation:

Even with some conjectures, I believe we first need to find a way to reliably reproduce this situation before attempting to make improvements. This way, we can verify the effectiveness of the optimizations.

BugenZhao commented 1 month ago

Will it be caused by the problem described in https://github.com/risingwavelabs/risingwave/pull/17612?

fuyufjh commented 2 weeks ago

Perhaps fixed by https://github.com/risingwavelabs/risingwave/pull/17612.

fuyufjh commented 1 week ago

Recurs today in another case