there should be a "Protection Against High Packet Loss" subsection of "Single Algorithm Behavior" - Githubissues

ietf-wg-ccwg / rfc5033bis

IETF drafts

Other

4 stars 6 forks source link

there should be a "Protection Against High Packet Loss" subsection of "Single Algorithm Behavior" #101

Closed nealcardwell closed 5 months ago

nealcardwell commented 5 months ago

The "Single Algorithm Behavior" section has a "Protection Against Bufferbloat" section. That's great. That's basically saying CCs should try to keep congestive delay at reasonable levels. But I would imagine there's a consensus that we would also say that CCs should try to keep congestive packet loss rates at reasonable levels. So IMHO there should be a "Protection Against High Packet Loss" subsection of "Single Algorithm Behavior".

This is important for "Single Algorithm Behavior" because even if a congestion control algorithm can tolerate high loss, many types of short request/response traffic do not want to pay the loss recovery performance penalty induced by high loss rates.

I can propose a pull request if there are no other volunteers.

cc: @martinduke @gorryfair

huitema commented 5 months ago

Yes, fair. The classic offender there is the "slow start" process of Reno (or Cubic). Last time I did measurements, it was fairly clear that this process was responsible for the bulk of packet losses experienced by short connections, and was also causing parallel connections to experience significant packet losses. In general, all CC algorithms that rely on packet losses for detecting congestion have that issue. But we have to be a bit careful to not come out as "dissing the algorithm that everybody uses".

nealcardwell commented 5 months ago

FWIW I was not thinking of losses from the initial slow-start overshoot, but mainly just losses in steady state. AFAIK for sufficiently shallow buffers all widely-deployed current algorithms (Reno, CUBIC, BBR) will have slow-start overshoot and cause high loss. IMHO that's just the BCP trade-off currently to favor rapid utilization of available bandwidth when the path properties are unknown.

I just wanted to have some text about wanting CC algorithms to avoid generating high packet loss in the steady state. So I meant to implicitly categorize Reno/CUBIC behavior in most scenarios as already meeting this requirement (except perhaps in scenarios where the fair share cwnd is low; e.g. when the fair share cwnd is 2-3 packets Reno/CUBIC do generate very high packet loss rates, like 20% or so: send 2 packets, 2 packets ACKed; send 3 packets, 2 packets ACKed and 1 packet lost; repeat; loss rate is 1 packet / 5 packets sent = 20%).

cc: @mattmathis

huitema commented 5 months ago

Do you have example of deployed systems that generate high packet losses, e.g., larger than Reno or Cubic? I am only aware of toy systems, like sending a file by sending a stream of packets at maximum speed, then cycle repeating all the unacked packets at maximum speed. I think some people call that the "Hilfinger protocol", but I could not find a reference -- and I believe nobody actually ships that, let alone tries to get it through the IETF.

nealcardwell commented 5 months ago

Do you have example of deployed systems that generate high packet losses, e.g., larger than Reno or Cubic?

Sure. For example, BBRv1 will generate high packet losses with multiple BBRv1 flows and a bottleneck buffer less than roughly 1.5*BDP, as has been discussed at the IETF and in papers (e.g., "Experimental evaluation of BBR congestion control": https://ieeexplore.ieee.org/document/8117540).

huitema commented 5 months ago

Thanks. That reference will help writing a short section.

martinduke commented 5 months ago

@nealcardwell @huitema you both hint you'd like to write a PR. Who's going to do it.?

huitema commented 5 months ago

I am willing to write a PR, but could we land #94 first? I am worried of merge conflicts if we don't do that.

huitema commented 5 months ago

Is #106 sufficient to close this issue?

nealcardwell commented 5 months ago

Yes, IMHO #106 is sufficient to close this issue. Thanks!