neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.27k stars 408 forks source link

bypass PageCache for L0 flush #7418

Closed jcsp closed 5 days ago

jcsp commented 4 months ago

Currently, when we do an InMemoryLayer::write_to_disk, there is a tremendous amount of random read I/O, as deltas from the ephemeral file (written in LSN order) are written out to the delta layer in key order.

In benchmarks (https://github.com/neondatabase/neon/pull/7409) we can see that this delta layer writing phase is substantially more expensive than the initial ingest of data, and that within the delta layer write a significant amount of the CPU time is spent traversing the page cache.

It's really slow: like tens of megabytes per second on a fast desktop CPU.

Since this is a background task whose concurrency we can limit, we can simplify and accelerate this by doing the whole thing in memory:

Follow-ups:

problame commented 1 month ago

This week: investigate staging OOMs

problame commented 1 month ago

Updated plan: don't spend much time investigating OOMs this week, instead progress coding work on the parent epic.

So: this week, disable l0_flush.mode=direct in staging. Then next week see if we had any more OOMs or not. If not, then it's another proof point that l0_flush.mode=direct is responsible for the OOMs.

problame commented 1 month ago

The OOMs were found to not be due to l0_flush.mode=direct. So, re-enabling in staging & pre-prod this week.

problame commented 1 month ago

aws.git commit that enabled staging & pre-prod:

merged Jul 22

first pre-prod prodlike cloudbench run that hit the new configuration was on evening of Jul 23

Behaved as expected & no significant impact to max RSS

Image

Image

problame commented 1 month ago

This week:

problame commented 1 month ago

Next week:

problame commented 3 weeks ago

Status update:

problame commented 2 weeks ago

To be determined before closing this issue:

problame commented 1 week ago

Decision yesterday: leave the option until after the ARM transition is complete, then re-evaluate.

problame commented 5 days ago

Decision yesterday: leave the option until after the ARM transition is complete, then re-evaluate.

This moves into a follow-up issue: https://github.com/neondatabase/neon/issues/8894