neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.78k stars 430 forks source link

`test_bulk_insert` / walingest generally is slower with tokio-epoll-uring #7124

Closed problame closed 5 months ago

problame commented 7 months ago

High-Level

Problem

The test_bulk_ingest benchmark shows about 2x lower throughput with tokio-epoll-uring compared to std-fs. That's why we temporarily disabled it in #7238.

The reason for this regression is that the benchmark runs on a system without memory pressure and thus std-fs writes don't block on disk IO but only copy the data into the kernel page cache. The tokio-epoll-uring cannot beat that at this time, and possibly never. (However, under memory pressure, std-fs would stall the executor thread on kernel page cache writeback disk IO. That's why we want to use tokio-epoll-uring. And we likely want to use O_DIRECT in the future, at which point std-fs becomes an absolute show-stopper.)

Further, bulk walingest is a pathological case because the InMemoryLayer into which ingest happens uses EphemeralFile. EphemeralFile flushes its mutable_tail every 8k of WAL. And it does not do double-buffering / pipelining, i.e., while flushing the buffer, no new WAL can be ingested.

More elaborate analysis:

Solution

Short-term fix: increase buffer sizes of write path.

Long-term fix: Introduce double-buffering on the write paths (delta, EphemeralFile, Image): open up a new BytesMut and flush the old one it in the background.

References

### Preliminaries
- [ ] https://github.com/neondatabase/neon/pull/7113
- [ ] https://github.com/neondatabase/neon/pull/7238
### Short-Term: Increase Buffer Sizes
- [ ] https://github.com/neondatabase/neon/pull/7273
- [ ] https://github.com/neondatabase/neon/pull/7482
- [ ] https://github.com/neondatabase/neon/pull/7483
- [ ] https://github.com/neondatabase/neon/pull/7484
- [ ] https://github.com/neondatabase/neon/pull/7485
problame commented 7 months ago

Last week:

This week:

problame commented 6 months ago

Status update:

This week:

problame commented 6 months ago

Update:

Posted my WIP branch (minus the part that does page-cache pre-warming) with benchmark results as WIP PR https://github.com/neondatabase/neon/pull/7273

Need feedback on whether we like the approach I took, or whether I should be less ambitious and simply junk up extend the EphemeralFile::write_blob's struct Writer further.

jcsp commented 6 months ago

This week: merge buffer size increase, ahead of rollout next week.

jcsp commented 6 months ago

Blocked by incidents last week, same plan this week