neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.33k stars 411 forks source link

Epic: per-tenant read path throttling #5899

Closed problame closed 2 weeks ago

problame commented 9 months ago

Motivation

See #5648

tl;dr: we currently serve reads (and write path => #7564) as fast as possible. This sets the wrong incentives and poses operational and economic problems:

DoD

Pageserver artificially caps the per-tenant throughput on the read path. I.e., to all upstream Neon components, this cap will appear to be the maximum read performance that you can get per tenant per pageserver.

The limits will be chosen such that a TBD (small single-digit) number of tenants can run at the limit. Discovery of the limit values is done through gradual rollout, conservative experimentation, and informed by benchmarks.

The upstream (compute) responds to the limit-induced backpressure efficiently, gracefully, and without risk of starvation.

There is enough observability to clearly disambiguate slowness induced by limiting from slowness caused by otherwise slow pageserver. This disambiguation must be on per-tenant (better: per-timeline) granularity.

The throttle are on-by-default and cannot be permanently overridden on a per-tenant basis. I.e., the implementation need not be suited for productization as "performance tier" or "QoS" feature.

Interactions

Sharding: with sharding, above limits will be per shard instead of per tenant. However, we may need to (re-)introduce per-tenant limits within a single pageserver process to incentivize placement of shards across different nodes for increased performance & load spreading. However, that's subject to future work.

High-Level Plan

### High Level
- [x] implement get-page benchmark (#5771)
- [x] implement get-page throttling mechanism (RFC: #5648)
- [x] ship code to staging & prod (disabled at runtime, but, minimal overhead)
- [x] enable throughout staging with values slightly below limit, see how nightly benchmarks & peter's benchmarks react
- [x] decision on the whole metrics situation; goes hand in hand with decision on where the throttle should be applied: inside Timeline::get or higher upstack?
- [x] ~~selective enablement for the problematic 20k rps tenants that should just use neonvm, gain more experience~~
- [x] decision on location of the throttle: inside Timeline::get like now or one throttle per page_service endpoint; DECISION: PR #6953 declares it as future work, will remain inside Timeline::get for now
- [x] Pageserver Operations Page: https://www.notion.so/neondatabase/Pageserver-Per-Tenant-Read-Throttling-2b941a3e46234285949ee4a10366fbbc?pvs=4
- [ ] enable gradually, starting with high default, then going downwards to a value where we want to be
### Impl
- [ ] https://github.com/neondatabase/neon/pull/6640
- [ ] https://github.com/neondatabase/neon/pull/6706
- [ ] https://github.com/neondatabase/aws/pull/1048
- [ ] https://github.com/neondatabase/neon/pull/6869
- [ ] https://github.com/neondatabase/aws/pull/1054
- [ ] https://github.com/neondatabase/neon/pull/6953
- [ ] https://github.com/neondatabase/aws/pull/1124
- [ ] https://github.com/neondatabase/neon/pull/7072
- [ ] https://github.com/neondatabase/aws/pull/1125
jcsp commented 7 months ago

Status:

jcsp commented 7 months ago

Initial draft PR is up for review -- could land this week.

Testing:

jcsp commented 6 months ago

Status:

jcsp commented 6 months ago

This week:

problame commented 6 months ago
jcsp commented 6 months ago

Status:

problame commented 5 months ago

Apart from

Reconciling with vecfored get changes to ensure we aren't double throttling.

nothing happened last week.

This week:

problame commented 5 months ago

Status update:

This week:

problame commented 5 months ago

Status update:

problame commented 4 months ago

I split off the write throttling aspect of this epic into a separate draft epic: https://github.com/neondatabase/neon/issues/7564

(We do not expect to work on write throttling this quarter)

problame commented 2 weeks ago

Closing this epic, the development work has finished long ago.

The last item

enable gradually, starting with high default, then going downwards to a value where we want to be

was and still is dependent on sharding + sharded ingest rollout, so that users who hit the throttle have an option to acquire more IOPS through sharding as needed.