Closed problame closed 5 months ago
Last week:
problame/investigate-slow-test_bulk_insert
)This week:
Status update:
This week:
Update:
Posted my WIP branch (minus the part that does page-cache pre-warming) with benchmark results as WIP PR https://github.com/neondatabase/neon/pull/7273
Need feedback on whether we like the approach I took, or whether I should be less ambitious and simply junk up extend the EphemeralFile::write_blob
's struct Writer
further.
This week: merge buffer size increase, ahead of rollout next week.
Blocked by incidents last week, same plan this week
High-Level
Problem
The
test_bulk_ingest
benchmark shows about 2x lower throughput withtokio-epoll-uring
compared tostd-fs
. That's why we temporarily disabled it in #7238.The reason for this regression is that the benchmark runs on a system without memory pressure and thus std-fs writes don't block on disk IO but only copy the data into the kernel page cache. The
tokio-epoll-uring
cannot beat that at this time, and possibly never. (However, under memory pressure, std-fs would stall the executor thread on kernel page cache writeback disk IO. That's why we want to usetokio-epoll-uring
. And we likely want to use O_DIRECT in the future, at which point std-fs becomes an absolute show-stopper.)Further, bulk walingest is a pathological case because the
InMemoryLayer
into which ingest happens usesEphemeralFile
.EphemeralFile
flushes itsmutable_tail
every 8k of WAL. And it does not do double-buffering / pipelining, i.e., while flushing the buffer, no new WAL can be ingested.More elaborate analysis:
Solution
Short-term fix: increase buffer sizes of write path.
Long-term fix: Introduce double-buffering on the write paths (delta, EphemeralFile, Image): open up a new BytesMut and flush the old one it in the background.
References