Closed martinduke closed 3 years ago
Among the ugly issues here is that inner connection QUIC ACKs might be congestion controlled by the outer connection, and/or scheduled behind potentially large amounts of data on other streams.
The way I think about this is that any router on the Internet implements a congestion controller. If we focus on a single direction of traffic, routers will have an input interface, and an output interface - if for some reason there are more packets coming into the input interface than the capacity of the output interface, then the router will have to exert control over that output link congestion, and its signaling mechanism is to drop some packets.
With that in mind, any TCP or QUIC connection going over the Internet has to deal with nested congestion controllers, and things work pretty decently. Going back to @martinduke's example from https://github.com/ietf-wg-masque/draft-ietf-masque-connect-udp/issues/12#issuecomment-706406176, any router can drop QUIC ACKs, and QUIC can handle that just fine.
My question is: how is a CONNECT-UDP proxy which can drop DATAGRAMs because of its congestion controller any different from a regular router than can drop packets due to its output link being saturated?
To clarify, I'm not suggesting we close this issue with no action - adding some notes explaining caveats sounds reasonable, but I don't see a need to develop an explicit solution to this problem.
I don't think this is a question of correctness (it doesn't create infinite loops or deadlocks), but one of performance.
While is some respects routers have "congestion controllers", they are generally not probing the path bandwidth or measuring latency, just looking at their local queue occupancy. Maybe your answer is to simply drop all datagrams for which there isn't immediately available cwnd; that may work, though we'd have to think about whether or not that's better than queueing them.
In QUIC we specifically exempted pure ACKs from congestion control because we were concerned about deadlocks, as congestion control could block sending ACKs that would unblock the connection. I don't think that's an issue here. But bad choices for the proxy scheduler could result in large delay spikes on these acks.
I'll spend some time thinking about corner cases, but fundamentally I don't have a problem with your proposal to simply add some caveats.
The problem isn't that the masque proxy might drop packets; that's fine as that a congestion signal as any other signal from a router, as David said; usually this is called AQM and not congestion control though. The problem is that there are two actual congestion controllers which react to the same input signal (loss) and try to control the same output signal (rate) but operating on different time scales. In TCP the outer connection would conceal losses but add delay; so the inner connection you not get the input signal but see changes in delay that can negatively interact with the control loop as that's not the expected scenario. In QUIC when datagrams are use the losses are not concealed and both controllers react to the same input signal. However, if the outer tunnel QUIC connection between the client and the proxy is working on a much smaller time scale, it actually faster and it should be the limiting factor. Still, I think more work is needed here and also some text to acknowledge this in the draft.
I would say that Davids point is valid for 3GPP radio, where an e/gNB buffers data and regulates the transmission rate based on a multitude of factors. In the case of RLC Acknowledged mode there is also nested loss recovery, though in this case losses are concealed similarly to what Mirja describes for the TCP case.
I agree that some discussion on this topic is warranted, but I would be wary against making too clear recommendations such as ''drop packets when CWND limited".
This should be fixed in today's update to the drafts.
This is a placeholder to discuss the effects/mitigations for nested congestion control, as required in the charter.