jlivingood / IETF-L4S-Deployment

IETF L4S Deployment Design Recommendations
15 stars 3 forks source link

ID: Sebastian Moeller #36

Closed jlivingood closed 1 year ago

jlivingood commented 1 year ago

...it seems to contain imprecise descriptions that I would prefer not to find in any RFC independent of the document track:

https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-livingood-low-latency-deployment-01*NewThinking__;Iw!!CQl3mcHX2A!AmULDT9h4izDsq8Uv5SzAjBiTa3_f8SR8jQk-CGMmG8BQerRUvygrpILGdxwk0uDMgf5VInlAeUNKIzbR2wU7Xwo$ "The Introduction says "Furthermore, unlike with bandwidth priority on a highly/fully utilized link, low latency using these new approaches is not a zero sum game - everyone can potentially have lower latency at no one else's expense." But this bears a bit more discussion to understand more fully."

[SM] This seems to misrepresent the mechanism L4S is build on, the dualQ coupled AQM, in a sense the reference L4S AQM, is described as a "conditional priority scheduler" and configure to have a rate share for the non-LL:LL-queue in the 1:10 (current Linux implementation IIRC) to 1:16 (rfc9332). This priority scheduler now is combined with a heuristic that aims at making the likelihood of that priority scheduling actually become visible rare. But "rare" is not never, and the resource distribution problem of selecting a packet for the immediate transmission seems to be a zero sum game, L4S or not L4S.

[SM] Here a wikipedia explanation of zero-sum game:

"Zero-sum game is a mathematical representation in game theory and economic theory of a situation which involves two sides, where the result is an advantage for one side and an equivalent loss for the other.[1] In other words, player one's gain is equivalent to player two's loss, therefore the net improvement in benefit of the game is zero."

[SM] For any given "transmit time" you ca picke a packet from either queue (or introduce a stall, but L4S does not do that)), this is very much a zero-sum game by the definition above.

[SM] Let's move on:

https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-livingood-low-latency-deployment-01*NewThinking__;Iw!!CQl3mcHX2A!AmULDT9h4izDsq8Uv5SzAjBiTa3_f8SR8jQk-CGMmG8BQerRUvygrpILGdxwk0uDMgf5VInlAeUNKIzbR2wU7Xwo$ "4S does not provide low latency in the same way as previous technologies like DiffServ (QoS). That prior QoS approach used packet prioritization, where it was possible to assign a higher relative priority to certain application traffic, such as Voice over IP (VoIP) telephony.

[SM] Again, given the right conditions, L4S will do exactly what is claimed here it would do not, e.g. when a short RTT L4S flow shares a link with a long RTT conventional-TCP flow.

https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-livingood-low-latency-deployment-01*NewThinking__;Iw!!CQl3mcHX2A!AmULDT9h4izDsq8Uv5SzAjBiTa3_f8SR8jQk-CGMmG8BQerRUvygrpILGdxwk0uDMgf5VInlAeUNKIzbR2wU7Xwo$ "This approach could provide consistent and relatively low latency by assigning high priority to a partition of the capacity of a link, and then policing the rate of packets using that partition. For example, on a 10 Mbps link, a high QoS priority could be assigned to VoIP with a dedicated capacity of 1 Mbps of the 10 Mbps link capacity. The other 9 Mbps would be available to lower QoS priority, such as best effort general Internet traffic that was not VoIP.¶

[SM] You realize that priority schedulers nowadays offer things like "rate-borrowing" where unused capacity of a higher priority class can be used by lower classes if the higher class does not use-up its allotment? So in this example the traditional QoS priority hierarchy offers exactly the same performance as the L4S approach, the single VoIP stream sees minimal delay and the remaining traffic gets ~9.9Mbps and there are no transmission stalls.

https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-livingood-low-latency-deployment-01*NewThinking__;Iw!!CQl3mcHX2A!AmULDT9h4izDsq8Uv5SzAjBiTa3_f8SR8jQk-CGMmG8BQerRUvygrpILGdxwk0uDMgf5VInlAeUNKIzbR2wU7Xwo$ "But even when QoS was used in this manner, the latency may have been relatively good but it was not ultra low latency of the sort that low latency networking (NQB and L4S) can deliver. As well, that QoS approach is to some extent predicated on an idea that network capacity is very limited and that links are often highly utilized. But in today's Internet, it is increasingly the case that there is an abundance of capacity to end users, which makes QoS approaches ineffective in delivering ever-lower latency.¶"

[SM] This example is counter-intuitve and potentially mis-leading, a VoIP flow of typically ~100Kbps well-paced packets when scheduled in a weighted priority of 1:9 Mbps as the sole member in that priority class, will see pretty much only the delay for the currently transmitted packet, and if the link technology does support pre-emption, it might not even see that delay.
[SM] This technically is "as low as queueing delay can go" that is piping that VoIP flow though an L4S scheduler/AQM instead can offer no advantage at all, once delay is minimal it is minimal, and hence "ultra-low latency". (And I add, that VoIP flow is also unlikely to respond to L4S-style marking making it risky to sort it into the LL-queue).

https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-livingood-low-latency-deployment-01*NewThinking__;Iw!!CQl3mcHX2A!AmULDT9h4izDsq8Uv5SzAjBiTa3_f8SR8jQk-CGMmG8BQerRUvygrpILGdxwk0uDMgf5VInlAeUNKIzbR2wU7Xwo$ "The result, as noted in the prior section, has been the role of dual queue networking. With these approaches, the new low latency packet processing queue is introduced on one or more links on the end-to-end path. The internal L4S queuing may still use a sort of internal prioritization, but this is not QoS in the typical sense because this is happening on an extremely short timescale - sub-round trip time (so microseconds or a few milliseconds)."

[SM] This does not seem to be a logical argument, prioritization really just means to decide "what to do next" and hence is not defined as requiring a minimal timescale, so just because L4S has short queues does not make it use anything else than prioritization when giving L4S traffic a higher probability of shorter sojourn times.

https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-livingood-low-latency-deployment-01*NewThinking__;Iw!!CQl3mcHX2A!AmULDT9h4izDsq8Uv5SzAjBiTa3_f8SR8jQk-CGMmG8BQerRUvygrpILGdxwk0uDMgf5VInlAeUNKIzbR2wU7Xwo$ "A more important and impactful force at play is the rapid congestion signals that are exchanged that will cause a sender to dynamically yeild to other traffic (as if the other traffic had no QoS priority, which it does not) - which can be thought of as back pressure to signal the sender to backoff prior to packetloss occuring.¶"

[SM] This seems to be at least irrelevant for the VoIP example from the same section. Also in traffic if I yield to other traffic at an intersection, I will stop immediately, but an L4S flow will still require a full RTT worth of time before it can react and the changed load hits the bottleneck. And yes that non-LL traffic has 1:10 or 1:16 priority, L4S just does a decent job of not engaging that part of its design under some typical? conditions (as long as users do not start mixing flows with large differences in RTT, or there is not too much under-responsive but paced traffic in the LL-queue, think 90 parallel VoIP flows of 100 Kbps each in the 10Mbps link example above).

[SM] Nitpick: yield instead of yeild

Again this is targeted as Informational and intended to document Comcast's recommendations, which might well be built on what the text describes, yet it would be helpful to make sure that technology is described in objective language that is free from bias.

P.S.: The L4S RFC's clearly state that L4S is built upon conditional priority scheduling between its two queues, so it seems rather surprising to claim that it does not. It is not hard to describe its mechanism in this draft in a way, that is correct and still shows how this design has the potential of higher utility than a strawman of a fixed prioritity assignment, but it will require an example where L4S actually deliver over the traditional hierarchical prioritization method.

jlivingood commented 1 year ago

[SM] The Introduction says "Furthermore, unlike with bandwidth priority on a highly/fully utilized link, low latency using these new approaches is not a zero sum game - everyone can potentially have lower latency at no one else's expense." This seems to misrepresent the mechanism L4S is build on, the dualQ coupled AQM, in a sense the reference L4S AQM, is described as a "conditional priority scheduler" and configure to have a rate share for the non-LL:LL-queue in the 1:10 (current Linux implementation IIRC) to 1:16 (rfc9332).

[JL] I'm not focusing on the current OS implementation to which you refer - I am addressing how it is implemented in an ISP network, such as in a CMTS. Feel free to debate OS implementation with the devs of those operating systems or perhaps the authors of the main L4S and NQB documents. ;-) I will ponder a re-wording that does not say zero-sum game.

[SM] "L4S does not provide low latency in the same way as previous technologies like DiffServ (QoS). That prior QoS approach used packet prioritization, where it was possible to assign a higher relative priority to certain application traffic, such as Voice over IP (VoIP) telephony." Again, given the right conditions, L4S will do exactly what is claimed here it would do not, e.g. when a short RTT L4S flow shares a link with a long RTT conventional-TCP flow.

[JL] I'll consider a re-wording but I am trying to contrast this with layer-3 network prioritization, where IMO this clearly differs. I know you likely disagree and I am not sure either of us will ever convince the other, but appreciate your view nonetheless.

[SM] Nitpick: yield instead of yeild

[JL] Good catch - thx

jlivingood commented 1 year ago
[SM2] Oh the term "conditional priority" is part of the DOCSIS specifications (e.g. CM-SP-MULPIv4.0-I05-220328.pdf):

"7.7.3.2

the Dual Queue structure that provides latency separation for non-queue-building flows from queue- building flows.

the coupling between the AQMs that ensures that the capacity of the aggregate service flow is used roughly equally by traffic flows across both queues, e.g., three capacity seeking traffic flows would get approximately one-third of the bandwidth each, regardless of which queue each flow utilizes.

Inter-SF Scheduler

As the Dual Queue Coupled AQM architecture provides only one-way coupling from the Classic Service Flow to the Low Latency Service Flow, it relies on the Inter-SF Scheduler to balance this by ensuring that conditional priority is given to the Low Latency Service Flow within the ASF. "Conditional priority" means that traffic of the Low Latency Service Flow will be serviced with a priority, yet without the Classic Service Flow being starved. Weighted Round Robin (WRR) is a simple scheduler that achieves the desired results, and is recommended in [draft-ietf-tsvwg-aqm-dualq-coupled].

For Upstream ASFs, the CMTS MUST implement a weighted scheduler between the Low Latency Service Flow and the Classic Service Flow within the Aggregate Service Flow. Since the WRR algorithm acts on variable-length packets, and the CMTS schedules Upstream Service Flows in terms of minislots, this specification requires a simple "Weighted" scheduler for upstream that assigns minislots for the two Service Flows according to the configured weight.

For Downstream ASFs, the CMTS SHOULD implement a WRR scheduler between the Low Latency Service Flow and the Classic Service Flow within the Aggregate Service Flow.

As discussed in Section 7.7.4.4, the Traffic Priority values for the Classic Service Flow and Low Latency Service Flow do not influence the Inter-SF Scheduler."

[SM2] This seems to very much apply to the CMTS as well, no? The problem is really, to give lower delay to a traffic class you need to prioritize its packets over packets from the other class, as that is what prioritization means decide about the temporal sequence you do something in... so to achieve its promises L4S needs to prioritize packets in the LL-queue over packets in the non-LL-queue, to reliably not starve non-LL-queue there needs to be a back-stop weighed priority for the non-LL-queue (which if set to 50% instead of 10% would be more useful to silence my argument) as well as a method/heuristic to make it rare for the back-stop to ever engage. However data shows that the current design has known challenges with this last component and hence ends of easily having to rely on the backstop mechanism.

I will ponder a re-wording that does not say zero-sum game.

[SM2] Thank you very much.

[SM] "L4S does not provide low latency in the same way as previous technologies like DiffServ (QoS). That prior QoS approach used packet prioritization, where it was possible to assign a higher relative priority to certain application traffic, such as Voice over IP (VoIP) telephony." Again, given the right conditions, L4S will do exactly what is claimed here it would do not, e.g. when a short RTT L4S flow shares a link with a long RTT conventional-TCP flow.

[JL] I'll consider a re-wording but I am trying to contrast this with layer-3 network prioritization, where IMO this clearly differs.

[SM] Fair enough. There are differences from an operators perspective mainly I would guess where and how to configure and monitor this and which toggles are actually available.

I know you likely disagree and I am not sure either of us will ever convince the other, but appreciate your view nonetheless.

[SM] I know, but the way the IETF process works is that I can try to give feed-back, and you can decide what to do with that information. I also understand that this draft describes Comcast's recommendations and I am in no position to question these as what they are, if you guys recommend that all I can do is nod. However I still try to discuss issues I see.

Regards Sebastian

jlivingood commented 1 year ago

Changed the zero sum game bit to: unlike with bandwidth priority on a highly/fully utilized link, low latency networking can better balance the needs of different types of best effort flows.

jlivingood commented 1 year ago

Removed use case example "For example, on a 10 Mbps link, a high QoS priority could be assigned to VoIP with a dedicated capacity of 1 Mbps of the 10 Mbps link capacity. The other 9 Mbps would be available to lower QoS priority, such as best effort general Internet traffic that was not VoIP."

jlivingood commented 1 year ago

removed a bunch of text in 'new approaches' that was problematic

jlivingood commented 1 year ago

https://github.com/jlivingood/IETF-L4S-Deployment/pull/40