neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
15.22k stars 444 forks source link

page_service: rewrite batching to work without a timeout, pipeline in protocol handler instead #9851

Open problame opened 6 hours ago

problame commented 6 hours ago

Problem

The timeout-based batching adds latency to unbatchable workloads.

We can choose a short batching timeout (e.g. 10us) but that requires high-resolution timers, which tokio doesn't have. I thoroughly explored options to use OS timers (see this abandoned PR). In short, it's not an attractive option because any timer implementation adds non-trivial overheads.

Solution

The insight is that, in the steady state of a batchable workload, the time we spend in get_vectored will be hundreds of microseconds anyway.

If we prepare the next batch concurrently to get_vectored, we will have a sizeable batch ready once get_vectored of the current batch is done and do not need an explicit timeout.

This can be reasonably described as pipelining of the protocol handler.

Implementation

We model the sub-protocol handler for pagestream requests (handle_pagrequests) as three futures that form a pipeline:

  1. Reading: read messages from pgb
  2. Batching: fill the current batch
  3. Execution: take the current batch, execute it using get_vectored, and send the response.

The Reading and Batching stage are conencted through an mpsc channel.

The Batching and Execution stage use a quirky construct to coordinate:

  1. An Arc<std::sync::Mutex<Option<Box<BatchedFeMessage>>>> that represents the current batch.
  2. A watch around it to notify Execution about new data.
  3. a Notify to notify Batch about data consumed.
  4. Inside the watch, a Mutex<BatchedFeMessage>

This construct allows the Execution stage to at any time, steal the current batch from Batching, using lock().unwrap().take().

Changes

On the holding The TimelineHandle in the pending batch

While batching, we hold the TimelineHandle in the pending batch. Therefore, the timeline will not finish shutting down while we're batching.

This is not a problem because the get_vectored call will fail with an error indicating that the timeline is shutting down. This results in the Execution stage returning a QueryError::Shutdown, which causes the pipeline / entire page service connection to shut down. This drops all references to the Arc<Mutex<Option<Box<BatchedFeMessage>>>> object, thereby dropping the contained TimelineHandles.

Performance

Local run of the benchmarks, results in this empty commit in the PR branch.

Use commands like this to compare a particular metric in different configurations.

git show cbe18393d390961fc3dcf61287fdae2dcddcdf6b  | grep -E '(None|tasks)' | grep '.batching_factor'

Key take-aways:

Refs

github-actions[bot] commented 4 hours ago

5535 tests run: 5307 passed, 2 failed, 226 skipped (full report)


Failures on Postgres 17

Postgres 16

Test coverage report is not available

The comment gets automatically updated with the latest test results
bd31f42f52f66cf174c4ed3031eb21aee54990b7 at 2024-11-22T15:01:37.393Z :recycle: