Closed jwhited closed 1 year ago
Confirming that increasing CXPLAT_MAX_IO_BATCH_SIZE
to a value equal or larger than what is seen in a single coalesced group resolves the regression:
diff --git a/src/platform/datapath_epoll.c b/src/platform/datapath_epoll.c
index eacd5504..58ae1cc5 100644
--- a/src/platform/datapath_epoll.c
+++ b/src/platform/datapath_epoll.c
@@ -40,8 +40,8 @@ CXPLAT_STATIC_ASSERT((SIZEOF_STRUCT_MEMBER(QUIC_BUFFER, Buffer) == sizeof(void*)
// This is calculated base on the number of the smallest possible single
// packet/datagram payloads (i.e. IPv6) that can fit in the large buffer.
//
-const uint16_t CXPLAT_MAX_IO_BATCH_SIZE =
- (CXPLAT_LARGE_IO_BUFFER_SIZE / (CXPLAT_MAX_MTU - CXPLAT_MIN_IPV6_HEADER_SIZE - CXPLAT_UDP_HEADER_SIZE));
+const uint16_t CXPLAT_MAX_IO_BATCH_SIZE = 53;
+// (CXPLAT_LARGE_IO_BUFFER_SIZE / (CXPLAT_MAX_MTU - CXPLAT_MIN_IPV6_HEADER_SIZE - CXPLAT_UDP_HEADER_SIZE));
jwhited@i5-12400-1:~$ sudo ip link set enp1s0f0np0 mtu 1499
jwhited@i5-12400-1:~/msquic/artifacts/bin/linux/x64_Release_openssl3$ ./secnetperf -sstats:1 -stats:1 -test:tput -exec-maxtput -target:10.0.0.20 -timed:1 -download:10000 -encrypt:0 -iosize:131072
Started!
Flow blocked timing:
SCHEDULING: 0 us
PACING: 0 us
AMPLIFICATION_PROT: 0 us
CONGESTION_CONTROL: 0 us
CONN_FLOW_CONTROL: 0 us
STREAM_ID_FLOW_CONTROL: 1930 us
STREAM_FLOW_CONTROL: 1929 us
APP: 1 us
[conn][0x55c9459a6a20] STATS: EcnCapable=0 RTT=2960 us SendTotalPackets=188331 SendSuspectedLostPackets=3 SendSpuriousLostPackets=0 SendCongestionCount=0 SendEcnCongestionCount=0 RecvTotalPackets=19008786 RecvReorderedPackets=0 RecvDroppedPackets=0 RecvDuplicatePackets=0 RecvDecryptionFailures=0
Result: 26509161525 bytes @ 21200913 kbps (10003.026 ms).
I haven't looked closely at all the constants involved in the original arithmetic, but going by their names we probably want to change CXPLAT_MAX_MTU to CXPLAT_MIN_MTU/equivalent. Alternatively, Linux enforces a limit of 64 datagrams, if we simply want to match up against that, see https://github.com/torvalds/linux/blob/v6.2/net/ipv4/udp_offload.c#L456 & https://github.com/torvalds/linux/blob/v6.2/include/linux/udp.h#L98
Describe the bug
Testing throughput w/secnetperf between 2 x Ubuntu 23.04 systems results in very low throughput when MTU is set below 1500, i.e. 1499, on the connected interfaces. Analyzing pcaps I see the following:
MTU 1500 = 64768 / 1472 gso = max 44 datagrams coalesced MTU 1499 = 64952 / 1412 gso = max 46 datagrams coalesced
PMTU seems to be doing the right thing, with the difference being smaller datagrams, but more of them in a coalesced group.
Discussion in Discord lead to looking at CXPLAT_MAX_IO_BATCHSIZE which currently resolves to a value of 45, suggesting this is a bug around that magical number. This may affect more than Linux being CXPLAT, but I have only run through this reduced MTU test on Linux thus far.
Affected OS
Additional OS information
Ubuntu 23.04
MsQuic version
main
Steps taken to reproduce bug
./secnetperf -sstats:1 -stats:1 -test:tput -exec-maxtput -target:<target> -timed:1 -download:10000 -encrypt:0 -iosize:131072
sudo ip link set <dev> mtu 1499
Expected behavior
Throughput should be roughly comparable @ MTU 1500 vs MTU 1499
Actual outcome
Throughput decreases 100x @ MTU 1499 vs MTU 1500.
Additional details
No response