neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.78k stars 430 forks source link

pageserver: spawning walredo process is slow #6565

Closed jcsp closed 6 months ago

jcsp commented 8 months ago

Problem

On some pageservers we see >1s times to spawn the process.

Investigation Results

DoD

Plan

Explore whether we can us posix_spawn; if so, ship to staging and observe whether it is a sufficient improvement. We can move the close_fds work into walredo startup, where we still trust the process.

If posix_spawn can't be used, implement a sidecar "spawner" process that pageserver asks to spawn walredo processes.

NB: we decide against a pool of pre-spawned walredo processes as the amoutn of CPU wasted on the inefficient fork() call is significant.

Background Reading

Work

### Solve The Issue
- [ ] https://github.com/neondatabase/neon/pull/6573
- [ ] https://github.com/neondatabase/neon/pull/6574
- [ ] https://github.com/neondatabase/neon/issues/6630
- [x] measure impact in staging & prod => merge above preliminary work to get better observability
- [x] it's good, we wrote a blog post about it
### Follow-Ups
- [ ] https://github.com/neondatabase/neon/issues/6580

Spin-Offs (no need to complete before closing)

jcsp commented 8 months ago