Bufferbloat is now just as much a concern as congestion collapse

huitema commented 1 year ago

I do remember observing congestion collapse with pre-VJ TCP interacting very poorly with low bandwidth X.25 links. So, yes, don't do that. But buffer bloat, or more specifically long queuing delays, is probably now just as much a concern as congestion collapse. Today, nodes that generate traffic without paying attention to the impact on delays are arguably causing "buffer bloat collapse", i.e., rendering a shared path unusable for delay sensitive traffic. Algorithm like Reno or Cubic are major offenders, because they only slow down when packets start getting lost, and thus will effectively fill up all available buffers.

VMatrix1900 commented 1 year ago

L4S isolates the buffer for low latency flow.

gorryfair commented 1 year ago

I am not at all convinced that "Bufferbloat is now just as much a concern as congestion collapse", that seems like a very bold statement. Starving flows of capacity... and even starving control flows... is still something that we really ought to be concerned about. That said, this does not apply everywhere and most places our existing strategies have been working well.

Latency is also important!

martinduke commented 1 year ago

Will add some text about the importance of latency/bufferbloat.

A difficult question: how good is good enough? Cubic is pretty bad: is a proposal that is no worse than Cubic adequate given the status quo in the internet?

huitema commented 1 year ago

Cubic and Reno are both pretty bad, and I would very much like to see them replaced by something better. I very much would like to say something like "Reno and Cubic cause bufferbloat, new algorithms such as BBR do not, and the new algorithms should be preferred." I would have liked to also recommend L4S/Prague, but if the network does not provide ECN feedback L4S falls back to Reno, and that's not good. And to answer the direct question from @martinduke, "no worse than Cubic" is not an acceptable bar.

I can see different congestion control algorithms reacting to three kinds of feedback: packet losses, delay increases, and ECN marks. IMHO, modern algorithms should react to all three. There is a tricky part there because all three signals are subject to some amount of noise: we observe non-congestion losses and delay jitter on some wireless links, and with L4S we will observe some ECN marks before congestion is significant. That's why we probably need to speak about frequency, and state that congestion control algorithms should back off if they observe significant increase in the frequency of packet losses, increase of the RTT, or increase in the frequency of ECN marks.

The bufferbloat issue is particularly evident during slow-start or start-up. The good news there is that Hystart++ does a pretty good job of backing off before causing actual congestion. BBRv2 is almost there too.

ietf-wg-ccwg / rfc5033bis

Bufferbloat is now just as much a concern as congestion collapse #8