neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.23k stars 406 forks source link

Timeline: death of flush loop should propagate to waiters #6424

Open koivunej opened 7 months ago

koivunej commented 7 months ago

We should of course use move semantics on the channel used for flushing requests/completions so that we would be propagated channel closed on panics -- this sort of confirms my earlier recollections that flushing task does not behave well with panics, as when idea of lock poisoning via asserts was thrown around.

Originally posted by @koivunej in https://github.com/neondatabase/neon/issues/6373#issuecomment-1902125377

Move semantics as in have the flush loop move something to the flush task (hopefully the channel sender), and when this is dropped (with the task panicking) the flushes would return an error, and this would immediatedly give us less confusing test failures. However the timeline after this remains in a strange state. At minimum we must support detach or pageserver shutdown from this state (without flush loop).

If we do not transition the timeline to broken state, we might continue to ingest wal, until we run out of memory due to having RAM full of frozen inmemory layers. Which will probably take a long time, and cause disk pressure before that, so this is a bit related to #5897.

Closed #3621 in relation to creating this because we no longer have std::sync::{Mutex,RWLock} usage near layers, and flush loop "not responding" explains the shutdown problem as well (as the tenant shutdown would hang while waiting for flushing).

In subject:

koivunej commented 7 months ago

From our discussion with Christian (no ping needed): I think the current implementation is for "restartable" timeline, however we have been moving towards /ignore + /load way.

koivunej commented 7 months ago

More context: why is there a flush loop? We need to have at most 1 concurrent flusher due to cloning the frozen layer from layermap, then writing it do disk, then re-locking the layermap and asserting it's the same one. We also need the layer to be available for reads while it's being written out. With the background task we can implement it without an async lock being held over the operation and no cancellation worries.