Open problame opened 6 hours ago
test_next_xid
: debug-x86-64test_metric_collection
: debug-x86-64
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_next_xid[debug-pg17] or test_metric_collection[debug-pg17]"
test_timeline_archive[4]
: release-arm64
test_timeline_delete_works_for_remote_smoke
: release-x86-64
test_pull_timeline[True]
: release-arm64
Problem
The timeout-based batching adds latency to unbatchable workloads.
We can choose a short batching timeout (e.g. 10us) but that requires high-resolution timers, which tokio doesn't have. I thoroughly explored options to use OS timers (see this abandoned PR). In short, it's not an attractive option because any timer implementation adds non-trivial overheads.
Solution
The insight is that, in the steady state of a batchable workload, the time we spend in
get_vectored
will be hundreds of microseconds anyway.If we prepare the next batch concurrently to
get_vectored
, we will have a sizeable batch ready onceget_vectored
of the current batch is done and do not need an explicit timeout.This can be reasonably described as pipelining of the protocol handler.
Implementation
We model the sub-protocol handler for pagestream requests (
handle_pagrequests
) as three futures that form a pipeline:pgb
take
the current batch, execute it usingget_vectored
, and send the response.The Reading and Batching stage are conencted through an
mpsc
channel.The Batching and Execution stage use a quirky construct to coordinate:
Arc<std::sync::Mutex<Option<Box<BatchedFeMessage>>>>
that represents the current batch.watch
around it to notify Execution about new data.Notify
to notify Batch about data consumed.watch
, aMutex<BatchedFeMessage>
This construct allows the Execution stage to at any time, steal the current batch from Batching, using
lock().unwrap().take()
.Changes
handle_pagerequests
BatchedFeMessage
with just one page request in itBatchedFeMessage
into an existingBatchedFeMessage
; returnsNone
on success and returns back the incoming message in case merging isn't possiblebatch_timeout
parametrizationtest_getpage_merge_smoke
totest_throughput
test_timer_precision
totest_latency
test_page_service_batching.py
On the holding The
TimelineHandle
in the pending batchWhile batching, we hold the
TimelineHandle
in the pending batch. Therefore, the timeline will not finish shutting down while we're batching.This is not a problem because the
get_vectored
call will fail with an error indicating that the timeline is shutting down. This results in the Execution stage returning aQueryError::Shutdown
, which causes the pipeline / entire page service connection to shut down. This drops all references to theArc<Mutex<Option<Box<BatchedFeMessage>>>>
object, thereby dropping the containedTimelineHandle
s.Performance
Local run of the benchmarks, results in this empty commit in the PR branch.
Use commands like this to compare a particular metric in different configurations.
Key take-aways:
concurrent-futures
delivers higherbatching_factor
thantasks
concurrent-futures
has lower CPU usagetime
) is better withconcurrent-futures
except in the case of unbatchable workload with max batch size 1; in that case,tasks
is 6% better but consume more CPU time for the same workconcurrent-futures
) => 127us (task
)concurrent-futures
consistently slightly better thantasks
, difference neglegibleRefs