neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.8k stars 430 forks source link

pageserver: WAL ingestion backpressure #5897

Open koivunej opened 11 months ago

koivunej commented 11 months ago

5108 removed RemoteTimelineClient::wait_completion calls from compaction. Before we were waiting to for L0[^1] uploads to complete before completing the compaction because we did not want to delete the L0s before uploading had completed. This synchronization need was fixed by UploadQueue carrying a ResidentLayer introduced in #4938. The waiting served another less-intended aspect: rate-limiting compactions per upload queue making progress.

At offsite in 2023-11 we discussed this with at least @hlinnaka that and agreed that we should ratelimit in case L0 => L1 has not had a chance to run. In the discussion I was thinking of a simple model; wait on opening a new in-memory layer when we have too many L0 layers (more than the compaction_threshold + 1 L0s for example).

In the discussion it came up that all diagnostics related to waiting for an lsn (Timeline::wait_for_lsn) need to be enriched with the knowledge of are we ratelimiting ingestion.

In the discussion worry was registered for setting the effective ingestion rate too low.

After the discussion data recovery from wAL for timelines has happened. It seems all timelines were too small to be considered for the ratelimiting.

Downside of not implementing backpressure:

As I write up this issue and am reconsidering the feasiblity, I wonder if we should begin by having global metrics for:

Related ratelimiting:

_Originally posted by @koivunej in https://github.com/neondatabase/neon/pull/5108#discussion_r1401906196_

[^1]: L0 and image layer uploads, but L0 for correctness, images accidentially.

problame commented 11 months ago

Downside of not implementing this:

I'm not 100% sure what "this" means. I guess your idea

In the discussion I was thinking of a simple model; wait on opening a new in-memory layer when we have too many L0 layers (more than the compaction_threshold + 1 L0s for example).

?


Assuming "this" means what I quoted above, some comments:

  1. I think above proposal would be an effective way to ensure L0=>L1 compaction can keep up with ingestion.
  2. I think the correct term for above proposal is not rate-limiting (aka throttling) but backpressure.
  3. Backpressure is theoretically superior to a plain per-tenant rate limit, but also harder to get right and reason about because it tries to accomplish more.
  4. For full backpressure, we'd additionally need a backpressure mechanism for the upload queue, i.e., if upload to S3 is slow for whatever reason, we should also stop ingesting WAL.

Given the above chain of thoughts, I'm wondering whether we should start with a simple per-tenant rate limit on bytes/second first, as proposed in #5899 .