risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.04k stars 579 forks source link

Set `pause_on_next_bootstrap` on an offline cluster #17904

Open kwannoel opened 3 months ago

kwannoel commented 3 months ago

Support changing it in risingwave.toml. In certain cases the cluster is crash looping, and because in standalone modes the faults between different nodes are not isolated as they are all embedded, the cluster can't be easily recovered.

BugenZhao commented 3 months ago

In certain cases the cluster is crash looping

I suppose you are referring to process panicking. Regular soft failures in streaming jobs should not affect the capability of setting system parameters with SQL interfaces.

kwannoel commented 3 months ago

In certain cases the cluster is crash looping

I suppose you are referring to process panicking. Regular soft failures in streaming jobs should not affect the capability of setting system parameters with SQL interfaces.

Yeah panic, regular soft failure will just trigger recovery, and meta + frontend should still be responsive.