Closed problame closed 2 weeks ago
Status:
Initial draft PR is up for review -- could land this week.
Testing:
Status:
This week:
Status:
Apart from
Reconciling with vecfored get changes to ensure we aren't double throttling.
nothing happened last week.
This week:
Status update:
This week:
Status update:
I split off the write throttling aspect of this epic into a separate draft epic: https://github.com/neondatabase/neon/issues/7564
(We do not expect to work on write throttling this quarter)
Closing this epic, the development work has finished long ago.
The last item
enable gradually, starting with high default, then going downwards to a value where we want to be
was and still is dependent on sharding + sharded ingest rollout, so that users who hit the throttle have an option to acquire more IOPS through sharding as needed.
Motivation
See #5648
tl;dr: we currently serve reads (and write path => #7564) as fast as possible. This sets the wrong incentives and poses operational and economic problems:
DoD
Pageserver artificially caps the per-tenant throughput on the read path. I.e., to all upstream Neon components, this cap will appear to be the maximum read performance that you can get per tenant per pageserver.
The limits will be chosen such that a TBD (small single-digit) number of tenants can run at the limit. Discovery of the limit values is done through gradual rollout, conservative experimentation, and informed by benchmarks.
The upstream (compute) responds to the limit-induced backpressure efficiently, gracefully, and without risk of starvation.
There is enough observability to clearly disambiguate slowness induced by limiting from slowness caused by otherwise slow pageserver. This disambiguation must be on per-tenant (better: per-timeline) granularity.
The throttle are on-by-default and cannot be permanently overridden on a per-tenant basis. I.e., the implementation need not be suited for productization as "performance tier" or "QoS" feature.
Interactions
Sharding: with sharding, above limits will be per shard instead of per tenant. However, we may need to (re-)introduce per-tenant limits within a single pageserver process to incentivize placement of shards across different nodes for increased performance & load spreading. However, that's subject to future work.
High-Level Plan