neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.76k stars 428 forks source link

Epic: pageserver backpressure #8390

Open skyzh opened 3 months ago

skyzh commented 3 months ago

followup on https://neondb.slack.com/archives/C03F5SM1N02/p1721058880447979

pageserver currently does not limit the write flow of the user. Note that pageserver has both foreground jobs (i.e., safekeeper write, and page reads) and background jobs (compaction, GC). If we don't backpressure, background jobs will have no resources to run, and thus slowing down the foreground jobs, creating a vicious cycle. The long-term goal is to find a way to ensure what the pageserver takes is what it can actually handle.

A quick idea is to use RocksDB's backpressure mechanism, which stalls write when num of L0 SSTs exceed some value.

jcsp commented 3 months ago

Let's look over our existing backpressure-related issues and make a plan

jcsp commented 2 months ago

Plan:

Our existing mitigation for L0 compaction (only compact 10 at once) makes us safe.