Closed serzhiio closed 7 months ago
Marginal differences in latency over loopback are not meaningful, so this data doesn't say much. What type of performance is your actual application concerned about?
Note io_uring is unlikely to automatically provide significant performance benefits, and might even reduce performance compared to standard quinn if it is not carefully structured, e.g. leveraging the offload mechanisms used by quinn-udp. io_uring is mostly interesting in that it enables new ways to manage scheduling and concurrency, which are complex to take advantage of.
Marginal differences in latency over loopback are not meaningful, so this data doesn't say much. What type of performance is your actual application concerned about?
Marginal? This latency diffs is not marginal for me and io_uring really gives some perfomance over poll (especially when using provided receive buffers shared with OS), compared to my previous Mio-based engine, both not Futures-based. Apllication is concerned about latency :)
Note io_uring is unlikely to automatically provide significant performance benefits, and might even reduce performance compared to standard quinn if it is not carefully structured, e.g. leveraging the offload mechanisms used by quinn-udp. io_uring is mostly interesting in that it enables new ways to manage scheduling and concurrency, which are complex to take advantage of.
The main idea was to test engine overhead latency and not the network's one. So i'm trying to understand is QUIC supposed to have better latencies than TCP+Tls?
This latency diffs is not marginal for me
What is your application such that the difference between e.g. 76us ±31.328 and 72us ±30.648 matters? Note the variance is much larger than the difference between means.
is QUIC supposed to have better latencies than TCP+Tls?
Choice of transport protocol will not meaningfully affect the latency of information traversing the loopback interface.
Setting bigger batches gives more difference. Last two results is much more informative and representative, it is almost a no context switch ops with SqPoll and 20% latency diff. Engine is for HFT applications and algorithms.
The last two results show a 7μs difference with about the same variance; those are still very close. Latencies at that scale are going to be very sensitive to the details of your code. Is your io_uring backend employing GSO and GRO? Have you done any profiling?
The last two results show a 7μs difference with about the same variance; those are still very close. Latencies at that scale are going to be very sensitive to the details of your code. Is your io_uring backend employing GSO and GRO? Have you done any profiling?
Not yet, just finished implementation. Profiling is the next step. GSO and GRO is implemented as in quinn_udp
but not sure if it's working or not, especially understanding that io_uring
does not support libc::SYS_sendmmsg
. UDP, in general, is a new stuff for me, so i may be wrong somewhere.
GSO and GRO is implemented as in
quinn_udp
Good, that's probably the single most important factor for kernel-side performance of a QUIC stack.
not sure if it's working or not
Check that you're getting Transmit
structures with segment_size
set to Some
from quinn-proto
, and that you're getting multiple segments from the kernel in GRO when many packets are incoming.
especially understanding that
io_uring
does not supportlibc::SYS_sendmmsg
This should be fine; sendmmsg
is just submitting multiple operations in one syscall, which io_uring already enables on its own.
Not yet, just finished implementation.
I'll be interested to hear what you find!
Closing as this seems to be stale, but feel free to open a new issue if there's something further to discuss.
I've just implemented QuinnProto on top of IoUring, made several benchmarks and made a comparison with IoUrings+TCP+TLS(Rustls)+WS. The benchmark is a simple Ping/Pong (every 10ms) for WS and two Uni channels for Quic with ping/pong like data.
In general, Client after connection send ping with Timestamp, server responds with same timestamp to client and Client compare received timestamp with current time.
WebSocket implemetation shows slightly better results, is it expected to be like this or no? P.S.: IoUring's IoPoll was not tested.