5108 removed `RemoteTimelineClient::wait_completion` calls from compaction. Before we were waiting to for L0[^1] uploads to complete before completing the compaction because we did not want to delete the L0s before uploading had completed. This synchronization need was fixed by UploadQueue carrying a `ResidentLayer` introduced in #4938. The waiting served another less-intended aspect: rate-limiting compactions per upload queue making progress.

At offsite in 2023-11 we discussed this with at least @hlinnaka that and agreed that we should ratelimit in case L0 => L1 has not had a chance to run. In the discussion I was thinking of a simple model; wait on opening a new in-memory layer when we have too many L0 layers (more than the compaction_threshold + 1 L0s for example).

In the discussion it came up that all diagnostics related to waiting for an lsn (Timeline::wait_for_lsn) need to be enriched with the knowledge of are we ratelimiting ingestion.

In the discussion worry was registered for setting the effective ingestion rate too low.

After the discussion data recovery from wAL for timelines has happened. It seems all timelines were too small to be considered for the ratelimiting.

Downside of not implementing backpressure:

in case of remote storage problems
- unbounded wasted work on restart
in case of too many tenants with too large compactions
- user visible errors because we allowed tenant counts to grow too high
noisy neighbour

As I write up this issue and am reconsidering the feasiblity, I wonder if we should begin by having global metrics for:

total unflushed inmemory layers
total L0s

Related ratelimiting:

5648

_Originally posted by @koivunej in https://github.com/neondatabase/neon/pull/5108#discussion_r1401906196_

[^1]: L0 and image layer uploads, but L0 for correctness, images accidentially.

Downside of not implementing this:

I'm not 100% sure what "this" means. I guess your idea

In the discussion I was thinking of a simple model; wait on opening a new in-memory layer when we have too many L0 layers (more than the compaction_threshold + 1 L0s for example).

Assuming "this" means what I quoted above, some comments:

I think above proposal would be an effective way to ensure L0=>L1 compaction can keep up with ingestion.
I think the correct term for above proposal is not rate-limiting (aka throttling) but backpressure.
Backpressure is theoretically superior to a plain per-tenant rate limit, but also harder to get right and reason about because it tries to accomplish more.
For full backpressure, we'd additionally need a backpressure mechanism for the upload queue, i.e., if upload to S3 is slow for whatever reason, we should also stop ingesting WAL.

Given the above chain of thoughts, I'm wondering whether we should start with a simple per-tenant rate limit on bytes/second first, as proposed in #5899 .

neondatabase / neon

pageserver: WAL ingestion backpressure #5897

5648