risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.87k stars 569 forks source link

v1.9.0-rc-1 compute node OOM with ch-benchmark 5 MVs through PG/MySQL CDC #16741

Open cyliu0 opened 4 months ago

cyliu0 commented 4 months ago

Describe the bug

https://buildkite.com/risingwave-test/ch-benchmark-pg-cdc-shared-source/builds/55#018f749e-d8ac-4e2a-a905-25038438f9f0 https://buildkite.com/risingwave-test/ch-benchmark-mysql-cdc-shared-source/builds/66#018f74a0-1ec8-4a62-8b65-61fda243cad7

https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?from=1715648701806&orgId=1&to=1715655520239&var-component=All&var-datasource=P2453400D1763B4D9&var-instance=benchmark-risingwave&var-namespace=tpc-20240514-010410&var-pod=All&var-table=All

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

v1.9.0-rc-1

Additional context

No response

lmatz commented 4 months ago

SCR-20240514-lir

The amplification is high.

SCR-20240514-ln1

The memtable spilling does not happen too frequently.

Is memtable spilling expected to solve this kind of high-amplification case?

cc: @cyliu0 @fuyufjh I think this level of amplification is not common, so I removed the "block" label.

But is this expected to be solved right now? I am uncertain but we need to have a conclusion and fix it.

lmatz commented 4 months ago

But I think we need to sort this out as one of our users did encounter bad performance when the amplification was high

High amplification cases do exist