Open problame opened 6 months ago
spent 2h trying to assess the data we have
=> product
snapshot as pdf for posterity
Had an extensive session with @Bodobolero today, going over the results.
Update in this analysis dashboard panel
Summary:
Decision: we're going to try to do runs with lower number of projects involved until we find a point where pageserver isn't overloaded. "Is not overloaded" is defined as
- the compaction iterations do not stall, i.e., they complete on time, no "took to long" log messages
- latencies look more like in prod (
<1ms
), not like the 130ms we get right now
Implemented an env-var configurable variant of the single tokio runtime patch for easier experimentaiton: https://github.com/neondatabase/neon/pull/7331
Note: Peter's benchmark overloaded compaction even without single runtime change. Christian working with Peter to right-size the benchmark to reflect realistic load & will create new ticket for that.
For the record, that issue is https://github.com/neondatabase/cloud/issues/12335
Next actions when we pick this up:
https://github.com/neondatabase/neon/issues/6628#issuecomment-2025015263