bypass PageCache for L0 flush

jcsp commented 4 months ago

Currently, when we do an InMemoryLayer::write_to_disk, there is a tremendous amount of random read I/O, as deltas from the ephemeral file (written in LSN order) are written out to the delta layer in key order.

In benchmarks (https://github.com/neondatabase/neon/pull/7409) we can see that this delta layer writing phase is substantially more expensive than the initial ingest of data, and that within the delta layer write a significant amount of the CPU time is spent traversing the page cache.

It's really slow: like tens of megabytes per second on a fast desktop CPU.

Since this is a background task whose concurrency we can limit, we can simplify and accelerate this by doing the whole thing in memory:

Read the full ephemeral file into memory -- layers are much smaller than total memory, so this is afforable
Do all the random reads directly from this in memory buffer instead of using blob IO/page cache/disk reads.
Add a semaphore to limit how many timelines may concurrently do this (limit peak memory). Set this to ~the number of cores, or some factor of the system memory / layer size, which ever is lower.
```
### Impl
```
[ ] https://github.com/neondatabase/neon/pull/8186
[ ] https://github.com/neondatabase/neon/pull/8190#
[ ] https://github.com/neondatabase/aws/pull/1568
[ ] https://github.com/neondatabase/neon/pull/8327
[ ] https://github.com/neondatabase/aws/pull/1596
[ ] https://github.com/neondatabase/aws/pull/1601
[ ] https://github.com/neondatabase/aws/pull/1605
[ ] https://github.com/neondatabase/aws/pull/1622
[ ] https://github.com/neondatabase/aws/pull/1655
[ ] https://github.com/neondatabase/neon/pull/8534
[ ] https://github.com/neondatabase/azure/pull/270
[x] gradual prod rollout
[ ] https://github.com/neondatabase/aws/pull/1656
[ ] https://github.com/neondatabase/aws/pull/1671
[ ] https://github.com/neondatabase/aws/pull/1723
[ ] https://github.com/neondatabase/aws/pull/1737
[x] decomission mode page-cached
[ ] https://github.com/neondatabase/neon/pull/8739

Follow-ups:

https://github.com/neondatabase/neon/issues/8894

problame commented 1 month ago

This week: investigate staging OOMs

problame commented 1 month ago

Updated plan: don't spend much time investigating OOMs this week, instead progress coding work on the parent epic.

So: this week, disable l0_flush.mode=direct in staging. Then next week see if we had any more OOMs or not. If not, then it's another proof point that l0_flush.mode=direct is responsible for the OOMs.

problame commented 1 month ago

The OOMs were found to not be due to l0_flush.mode=direct. So, re-enabling in staging & pre-prod this week.

problame commented 1 month ago

aws.git commit that enabled staging & pre-prod:

https://github.com/neondatabase/aws/commit/cfda0172ba6e91eda7ceb70e5c365f88026d6aa2

merged Jul 22

first pre-prod prodlike cloudbench run that hit the new configuration was on evening of Jul 23

Behaved as expected & no significant impact to max RSS

problame commented 1 month ago

This week:

https://github.com/neondatabase/neon/pull/8534
One prod region?

problame commented 1 month ago

Next week:

rollout to https://github.com/neondatabase/aws/pull/1671

problame commented 3 weeks ago

Status update:

l0_flush.mode=direct rolled out everywhere ; last 3 regions happened yesterday
- Graphs in Slack

problame commented 2 weeks ago

To be determined before closing this issue:

Do we want to retain the configurability of the concurrency limit?
Do we want to invest into more "desired" state configurability, i.e., not just a concurrency limit but a "anticipated concurrent memory usage" limit?
If neither, let's remove the config option.

problame commented 1 week ago

Decision yesterday: leave the option until after the ARM transition is complete, then re-evaluate.

problame commented 5 days ago

Decision yesterday: leave the option until after the ARM transition is complete, then re-evaluate.

This moves into a follow-up issue: https://github.com/neondatabase/neon/issues/8894

neondatabase / neon

bypass PageCache for L0 flush #7418