Open Ralith opened 3 years ago
When implementing https://github.com/quinn-rs/quinn/pull/1127 I noticed that cwnd growth depends heavily on the setting of the new MAX_TRANSMIT_DATAGRAMS
flag. When configured lower, CWND is more likely to grow to infinitiy since a single poll will never yield all available datagrams, which leads app_limited
set to false
and allows CWND to grow. If we increase the value and all datagrams are polled then app_limited
might get set to true
which stops CWND from growing. However that also means higher latencies.
Some examples:
MAX_TRANSMIT_DATAGRAMS = 20:
path: PathStats {
rtt: 389.489µs,
cwnd: 536692490,
MAX_TRANSMIT_DATAGRAMS = 30:
path: PathStats {
rtt: 478.052µs,
cwnd: 2821703,
MAX_TRANSMIT_DATAGRAMS = 50:
path: PathStats {
rtt: 646.804µs,
cwnd: 1481010,
MAX_TRANSMIT_DATAGRAMS = 100:
path: PathStats {
rtt: 533.493µs,
cwnd: 1476074,
While this benchmark looks like 30
would be a better value then 20
I would remind that I'm running this on a desktop with extremely high single-threaded performance, and pretty much everything else would have a harder time producing 30 packets and have higher latencies.
Overall I think the way app_limited
is set at the moment is not optimal, and there needs to be a strategy improvement
What if we maintain a separate cpu_limited
flag, set it whenever we stop calling poll_transmit
early, and unset it whenever poll_transmit
is fully drained?
Interesting thought. But it kind of relies on whether the application does call poll_transmit
again as early as possible, and not delays it for other reasons - right? The quinn
wrapper does, but it would be some kind of implicit contract (similar to the read/EAGAIN/epoll
ones.
The other question is whether this would already been improved upon if #1131 gets addressed by merging tasks. I'm not sure, haven't thought too deeply about this.
Interesting thought. But it kind of relies on whether the application does call poll_transmit again as early as possible, and not delays it for other reasons - right?
So long as the polling pattern is consistent, I think the behavior would be correct wrt. the amount of CPU time being given. If polling is arbitrarily erratic I don't think we can hope to guarantee good behavior regardless.
The other question is whether this would already been improved upon if #1131 gets addressed by merging tasks
I don't think so; the fundamental problem is that the current construction of app_limited
assumes we'll always be able to drain the connection's logical queues in a timely fashion, and when we're CPU bound we by definition can't, regardless of how work is distributed. Performance might get better, but the design flaw stands.
If throughput is limited by CPU resources, then every non-app-limited transmit will, once ACKed, cause the congestion window to grow. If CPU resources abruptly increase (e.g. due to reduced contention from other tasks and/or processes) then we risk flooding the link. The congestion controller interface/implementation should be modified such that success with N in-flight bytes is interpreted as license to grow the congestion window by at most a constant multiple of N.