pageserver currently does not limit the write flow of the user. Note that pageserver has both foreground jobs (i.e., safekeeper write, and page reads) and background jobs (compaction, GC). If we don't backpressure, background jobs will have no resources to run, and thus slowing down the foreground jobs, creating a vicious cycle. The long-term goal is to find a way to ensure what the pageserver takes is what it can actually handle.
A quick idea is to use RocksDB's backpressure mechanism, which stalls write when num of L0 SSTs exceed some value.
For the l0 stack problem: this may depend on other compaction design decisions? Potentially trigger compaction on LSN rather than time -- that way faster writing tenants get to compact more often.
Our existing mitigation for L0 compaction (only compact 10 at once) makes us safe.
followup on https://neondb.slack.com/archives/C03F5SM1N02/p1721058880447979
pageserver currently does not limit the write flow of the user. Note that pageserver has both foreground jobs (i.e., safekeeper write, and page reads) and background jobs (compaction, GC). If we don't backpressure, background jobs will have no resources to run, and thus slowing down the foreground jobs, creating a vicious cycle. The long-term goal is to find a way to ensure what the pageserver takes is what it can actually handle.
A quick idea is to use RocksDB's backpressure mechanism, which stalls write when num of L0 SSTs exceed some value.