Consider more aggressive GSO batching

Pacing is applied in Connection::poll_transmit, before assembling a new packet:

https://github.com/quinn-rs/quinn/blob/af30cbdc4bc77e8582dae6887d2c099795135282/quinn-proto/src/connection/mod.rs#L645-L659

Here bytes_to_send accounts for at most two packets: any previous packet in the current GSO batch, and the potential next packet. If transmit rate is pacing-limited, that means we're waking up every time there's capacity for 1-2 additional packets, which severely limits GSO batch size, significantly increasing CPU overhead.

It's not obvious how often we're pacing-limited. If we're sending at the path's full capacity, then the only time we're not congestion-limited is after receiving an ACK that frees up some congestion window space. Because the pacing token bucket refills slightly faster than one cwnd per RTT, we should expect that ACKs, on average, free up less cwnd space than has been made available by the pacer in the period since the last ACK, except when that exceeds the maximum pacing burst size. If ACKs are sufficiently infrequent, then we should expect to observe frequent batches of min(burst size, GSO batch size) packets, followed by a trickle of 1-2 packet batches until the cwnd is refilled or the next ACK is received.

On the other hand, if ACKs are delivered frequently, the congestion window might prevent us from forming large GSO batches regardless of pacing. We should explore how much batching we see in practice, and consider delaying transmits until pacing and congestion control permit a larger GSO batch size.

quinn-rs / quinn

Consider more aggressive GSO batching #1835