neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.91k stars 435 forks source link

Epic: Bypass PageCache for user data blocks #7386

Open jcsp opened 6 months ago

jcsp commented 6 months ago

From experiments, we believe that use of the userspace PageCache for user data blocks is a net-negative:

Ideas / Initial Plan

Execution

### High-Level
- [x] bypass PageCache for data blocks of delta and image layers => done by Vlad's work for get_impl=vectored
- [ ] https://github.com/neondatabase/neon/pull/8105
- [ ] https://github.com/neondatabase/neon/issues/7418
- [ ] https://github.com/neondatabase/neon/issues/8184
- [ ] https://github.com/neondatabase/neon/issues/8183
jcsp commented 6 months ago

Next step:

problame commented 6 months ago

this week: plumb through RequestContext on read path

Also, Vlad informed me that the switch to vectored get for all Timeline::get means that we'll stop using the PageCache for user data blocks. It's starting to roll out into prod this week.

problame commented 5 months ago

This week:

problame commented 4 months ago

Update:

problame commented 4 months ago

This time range here is instructive: in this time range, there is a high access rate coming from LayerFlushTask for InMemoryLayers which is displacing DeltaLayerBtreeNode layers which causes cache misses for WalReceiverConnectionHandler and to a lesser extent Compaction.

https://neonprod.grafana.net/d/fe7f056c-3ee1-49ef-a08d-e66055099396/pageserver-page-cache?orgId=1&from=1718937864602&to=1718957534079&var-datasource=HUNg6jvVk&var-hit_rate_drill_task_kind=WalReceiverConnectionHandler&var-hit_rate_drill_content_kind=InMemoryLayer&var-adhoc=neon_region%7C%3D%7Cus-east-2&var-adhoc=instance%7C%3D%7Cpageserver-4.us-east-2.aws.neon.build

Image

Image

problame commented 4 months ago

If it's just the LayerFlushTask perhaps this is the time to implement

problame commented 4 months ago

This week:

problame commented 3 months ago

Status update: