risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.88k stars 570 forks source link

Single node mode resume from suspend fails #16911

Open kwannoel opened 4 months ago

kwannoel commented 4 months ago

Describe the bug

cargo run --bin risingwave

Then ctrl-z after the node has finished starting up.

Wait ~1min. Then resume it with fg. The node will error and fail to start.

Error message/log

TODO

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

M1 OSX, on latest main.

The version of RisingWave

No response

Additional context

No response

BugenZhao commented 2 months ago

Is it due to the fact that we use absolute time when checking for staleness with heartbeats?

https://github.com/risingwavelabs/risingwave/blob/2eeec6d43561d20aecdc4f07524353fd733f6da3/src/meta/src/manager/cluster.rs#L355-L368

kwannoel commented 2 months ago

Possible I guess. We can add a log to indicate that it's expiring at XYZ time.