gwhiteCL / NQBdraft

IETF draft on Non-Queue-Building Per-Hop-Behavior
1 stars 2 forks source link

Should traffic protection be mandatory to implement? #48

Closed gwhiteCL closed 1 month ago

gwhiteCL commented 2 months ago

DB raised an issue (in https://mailarchive.ietf.org/arch/msg/tsvwg/OMOc-jHjik2_p3GWZBt4cPZbkHc/):

Section 5 states: "Malicious behavior is not necessarily based on rational self-interest, so incentive alignment is not a sufficient defense, but the large majority of users do not act out of malice. Protection against malicious attacks (and accidents) is addressed in Section 5.2 and summarized in Section 10."

An important implication is that traffic protection is the countermeasure to malicious use, which is confirmed by section 10: "To preserve low latency performance for NQB traffic, networks that support the NQB PHB will need to ensure that mechanisms are in place to prevent malicious traffic marked with the NQB DSCP from causing excessive queue delays. Section 5.2 recommends the implementation of a traffic protection mechanism to achieve this goal but recognizes that other options might be more desirable in certain situations."

[...] the usually IETF requirement for crucial security countermeasures such as this (traffic protection) is that they be mandatory to implement so that they are available for use if/as needed.

and

Both the incentive framework and security of NQB have a fundamental dependency on traffic protection - absent "certain situations", neither works without traffic protections. Nonetheless, the requirement for traffic protection in the second paragraph of Section 5.2 is a SHOULD: "... network elements that support the NQB PHB SHOULD support a "traffic protection" function ...". That's completely inadequate - based on incentives framework and security considerations, the appropriate requirement is "... network elements that support the NQB PHB MUST support and SHOULD use a "traffic protection" function ..." .

Turning to "certain situations" - these would initially be exceptions to "SHOULD use" and perhaps equipment that is only used in such exceptional situations could be an exception to "MUST support". Unfortunately, the paragraph in Section 5.2 on these exceptional situations is a serious hand-wave: "There are some situations where traffic protection is potentially not necessary. One example could be a network element designed for use in controlled environments (e.g., enterprise LAN) where a network administrator is expected to manage the usage of DSCPs. Another example could be highly aggregated links (links designed to carry a large number of simultaneous microflows), where individual microflow burstiness is averaged out and thus is unlikely to cause much actual delay." That's nowhere near good enough.

For "SHOULD use", quoting from RFC 2119's definition of "SHOULD": "... there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course." The "full implications" that "must be understood and carefully weighed" in this case are the presence of incentives to mismark and the absence of protection against malicious use. Omission of these concerns is a major flaw in the Section 5.2 paragraph on exceptional situations. OTOH, there will be certainly be some situations in which network operators have effective controls outside of the NQB forwarding implementation that prevent mismarking and malicious use, and it would be good to describe at least one such situation - an extreme example would be an air-gapped network with complete controls on application deployment and network traffic origination, including traffic marking..

Exceptions to "MUST support" are a taller order, although one possibility could be implementations that are only usable in networks that have "valid reasons to ignore" the "SHOULD use" could be one possibility - in essence the implementer has to be certain that mismarking and malicious use are impossible in networks that use her implementation. In order to agree to any text describing exceptions to "MUST support", I want to first understand the specific network examples that motivate the exception(s), including their mechanisms for prevention of mismarking and malicious use, since traffic protection will not be available for those purposes.

GF provided a further comment (in https://mailarchive.ietf.org/arch/msg/tsvwg/-rOK_naHqiJtNV4TrcQlGyZhCNY/):

GF: I wonder if there are several options here, only one of which we might like: (a) A WG could require implementation - some implementations of course might still choose to ignore, as they will, the IETF can’t control that. (b) A WG can make this optional, in which case I would expect to explain how this will be managed, and clearly indicate to indicate the implications, and that needs work. (c) A WG can constrain this use to a limited domain and defined the scoping - or we can leave that for others to work out, as they will. More precision here would seem very helpful at this stage. I'm thinking that the last is not the way to proceed.

gwhiteCL commented 1 month ago

@dlb237 I continue to disagree that traffic protection needs to be made mandatory to implement, and I have some suggestions on a way forward that provides a compromise. Here are some of the reasons why I disagree:

  1. Necessity: NQB is a shallow-buffered best-effort service. It is understood that performance is not guaranteed for any best-effort service. For example, the IETF doesn’t mandate that implementations of the Default PHB provide mechanisms to police/prevent applications from inducing delay and/or loss.

  2. Incentives: As I wrote in https://github.com/gwhiteCL/NQBdraft/issues/47#issuecomment-2215318283, even without traffic protection, if the NQB queue is configured as specified (i.e. with a shallow buffer), there is a disincentive for QB applications to mis-mark their traffic because they will see excessive packet drops. So, I disagree with your assertion that the incentives framework fundamentally depends on the presence of traffic protection. Traffic protection as defined in DOCSIS Queue Protection arguably provides less of a disincentive for inappropriate marking than would be the case in the absence of QP, because it results in significantly less packet loss for the offending application.

  3. Incentives: Incentives apply more broadly than on a hop-by-hop basis, and also generally apply more broadly than on a path-by-path basis. In other words, a QB application developer would (generally) need to make a decision as to whether to mark their packets as NQB without specific knowledge whether the traffic would be subjected to traffic protection or not. So, again, I disagree with the assertion that the incentives framework fundamentally depends on the presence of traffic protection.

  4. Security: The incentives above don’t address malicious sources. While traffic protection is the remedy for this, some network environments have other ways to address malicious sources (e.g. only approved applications are deployed in the network, or traffic conditioning is performed at the network edge).

I definitely agree that traffic protection is the preferred implementation, but I disagree that it needs to be made mandatory to implement.

As a compromise, I'd like to suggest that we strengthen the recommendation around the implementation of traffic protection, and eliminate some of the language that seems of offer rationales to ignore that recommendation, futher I'd like to suggest that we mandate some mechanism that a network operator can use to detect and avoid abuse.

Specifically, the suggestion is that we address your concern about abuse of the code point by adding a mandatory requirement that NQB PHB implementations provide statistics that can be used by the network operator to detect whether abuse is occurring. These statistics could be as simple as packet and drop counters. This requirement would ensure that operators who configure the NQB PHB have the ability to track the amount of packet drop that is occurring due to traffic overrunning the shallow buffer, and then take action if they feel as though the PHB is causing more issues than it is solving in their environment. Those actions could include disabling the PHB, identifying and dealing with the sources of malicious traffic directly, or pursuing a feature request with the equipment manufacturer to add a traffic protection function.

In addition, I think we can delete the words in section 10: "but recognizes that other options might be more desirable in certain situations." so that the recommendation to implement traffic protection isn't watered down.

Regarding the paragraph in 5.2 discussing situations where traffic protection is potentially not needed, we could rework the paragraph to emphasize that the decision by an implementer to not implement traffic protection might limit the deployment/usage of their NQB PHB implementation to a small subset of potential sitations, and it would put the onus on the operator to monitor usage and take remediations manually rather than automatically dealing with misbehaving traffic. We can also add text to more fully specify the implications of ignoring the recommendation. That, I think, would strengthen the SHOULD as opposed to offering rationales for ignoring it.

dlb237 commented 1 month ago

[+tsvwg list]

I continue to disagree that traffic protection needs to be made mandatory to implement, and I have some suggestions on a way forward that provides a compromise. This overall direction looks promising, but the suggested compromise is not (yet) good enough. Significant work on the draft will be needed, specifically on items 1 and 4:

  1. Necessity: NQB is a shallow-buffered best-effort service. It is understood that performance is not guaranteed for any best-effort service. I understand the overall intent, and I'm fine with that as a high-level goal/direction. The problem is that in the -24 version of the draft, "shallow-buffered" is an all-but-undefined term.

To do better, the draft needs to provide a concrete specification of "shallow-buffered" and require that NQB implementations use shallow buffers. If this specification of "shallow-buffered" requirements is done well, it should lead to corresponding (hopefully minor) revisions of the incentives framework discussion that result in an acceptable resolution to points 2 and 3 on Incentives.

OTOH, the comment that "performance is not guaranteed for any best-effort service" appears to have missed the point. I definitely agree that the draft is not guaranteeing any performance for NQB traffic, but this line of reasoning is claiming to guarantee non-performance(!) for QB traffic that uses (abuses) the NQB service. Specifically, the claim is being made that a shallow-buffered NQB service provides a sufficient non-performance guarantee to ensure that QB traffic has nothing to gain (and quite a bit to lose) by using (abusing) the shallow-buffered NQB service. The detailed requirements for sufficiently shallow buffers that realize that non-performance guarantee need to be specified and mandated, e.g., in Section 5.1 of the draft.

  1. Security: The incentives above don’t address malicious sources. While traffic protection is the remedy for this, some network environments have other ways to address malicious sources (e.g. only approved applications are deployed in the network, or traffic conditioning is performed at the network edge).

Proceeding in this direction ... if traffic protection is not mandatory to implement, then the draft will need to restrict NQB implementation and usage (using "MUST" and "MUST NOT" or equivalent RFC 2119 keywords) to network environments that have "other ways to address malicious sources." The nature and/or results of those "other ways" will need to be specified in a sufficiently concrete fashion that a network operator can readily determine whether or not her network has sufficient "other ways to address malicious sources."

Turning to the suggested compromise:

Specifically, the suggestion is that we address your concern about abuse of the code point by adding a mandatory requirement that NQB PHB implementations provide statistics that can be used by the network operator to detect whether abuse is occurring. These statistics could be as simple as packet and drop counters. That could work in combination with a solution to the "4. Security" problem suggested above. By themselves, requiring collection/provision of statistics is not sufficient to resolve the security problem.

Regarding the paragraph in 5.2 discussing situations where traffic protection is potentially not needed, we could rework the paragraph ... That would help ... after the security problem (4) is resolved (see above)..

The bottom line is that items 1 (e.g., What is the concrete specification of "shallow-buffered" ?) and 4 (e.g., What are other ways that are sufficient to address malicious sources?) need to be addressed.

Thanks, --David

dlb237 commented 1 month ago

This comment is from Michael Overcash, GitHub mislabeled it.

I don't think you've really fully addressed Greg's main point here.

"if the NQB queue is configured as specified (i.e. with a shallow buffer), there is a disincentive for QB applications to mis-mark their traffic because they will see excessive packet drops."

Traditional QoS/Priority approaches created an incentive to cheat by creating a "fast lane" for latency sensitive services. This is emphatically not how L4S and other similar AQM based methods work. The shallow-buffer queue is not a fast lane and will only improve latency performance for endpoints that implement the appropriate algorithms. An endpoint that tries to "cheat" will just end up policed and will experience worse performance. Why would anyone go out of their way to use the shallow-buffer queue to get worse performance?

I don't think it is productive to rigorously define "shallow buffered" here. The exact buffer depth is a function of the algorithm and vendor implementation.

I also don't think it is necessary or helpful to try to solve for malicious actors here. Any malicious actor can fill up queues and crowd out other traffic simply by sending high rate UDP. Shallow buffers are not uniquely vulnerable here. On the contrary, there is no buffer so large that a malicious actor cannot easily fill it.

Just my two cents...

Michael Overcash Principal Architect, Cox Communications @.***

gwhiteCL commented 1 month ago

FYI, for readability I'm going to try pruning some of the extraneous text from the comments above in the GitHub interface. The way GitHub handles emailed responses isn't ideal.

dlb237 commented 1 month ago

This comment is from Greg White, GitHub mislabeled it.

[From David:]

I understand the overall intent, and I'm fine with that as a high-level goal/direction. The problem is that in the -24 version of the draft, "shallow-buffered" is an all-but-undefined term. …. The detailed requirements for sufficiently shallow buffers that realize that non-performance guarantee need to be specified and mandated, e.g., in Section 5.1 of the draft.

Ok, I think this is solvable. Here is a proposal: The NQB queue MUST have a buffer size that is significantly smaller than the buffer provided for Default traffic. It is RECOMMENDED to configure an NQB buffer size less than or equal to 10 ms at the shared NQB/Default egress rate.

[From David:]

Proceeding in this direction ... if traffic protection is not mandatory to implement, then the draft will need to restrict NQB implementation and usage (using "MUST" and "MUST NOT" or equivalent RFC 2119 keywords) to network environments that have "other ways to address malicious sources."

[From Michael:]

I also don’t think it is necessary or helpful to try to solve for malicious actors here. Any malicious actor can fill up queues and crowd out other traffic simply by sending high rate UDP. Shallow buffers are not uniquely vulnerable here. On the contrary, there is no buffer so large that a malicious actor cannot easily fill it.

I agree with Michael’s viewpoint. Similar to my previous argument, the IETF doesn't restrict implementations of the Default PHB to only being deployed in network environments that have "other ways to address malicious sources." What would be the rationale to do so here? It is IMO definitely ok to include guidance to network operators saying that NQB implementations that lack traffic protection are as vulnerable to malicious traffic as other queues, and so the operator should follow existing best practices to protect their NQB queues from malice.

dlb237 commented 1 month ago

I don't think you've really fully addressed Greg's main point here.

"if the NQB queue is configured as specified (i.e. with a shallow buffer), there is a disincentive for QB applications to mis-mark their traffic because they will see excessive packet drops."

There's a reason for that - I agree in principle (or at least I don't disagree) with that point. The problem that I have with the draft is that it needs to provide the details of what "configured as specified (i.e. with a shallow buffer)" means. Unfortunately, this is an example of how not to do that:

I don't think it is productive to rigorously define "shallow buffered" here. The exact buffer depth is a function of the algorithm and vendor implementation.

In other words, it's up to the implementers to figure out what to do. That doesn't specify much of anything, and it's a lousy foundation for the strong claims being made about the incentives framework.

I also don't think it is necessary or helpful to try to solve for malicious actors here. Any malicious actor can fill up queues and crowd out other traffic simply by sending high rate UDP. Shallow buffers are not uniquely vulnerable here.

That's the wrong class of malicious actor. Theft of service is a different attack (with different malicious actor behavior) from denial of service. The draft's incentives framework is making strong claims that theft of service attempts are sufficiently counterproductive for the thief so as to make other countermeasures (e.g., traffic protection) unnecessary. The fact that all the buffers, e.g., both best effort and NQB, can be overwhelmed by a sufficiently large denial of service attack has almost no relevance to that theft of service concern.

Thanks, --David

dlb237 commented 1 month ago

See [SM] below...

On 23 July 2024 21:52:11 CEST, "Overcash, Michael (CCI-Atlanta)" @.***> wrote:

I don't think you've really fully addressed Greg's main point here.

"if the NQB queue is configured as specified (i.e. with a shallow buffer), there is a disincentive for QB applications to mis-mark their traffic because they will see excessive packet drops."

Traditional QoS/Priority approaches created an incentive to cheat by creating a "fast lane" for latency sensitive services. This is emphatically not how L4S and other similar AQM based methods work.

[SM] Both DualQ and the low latency DOCSIS scheduler it was based upon are at their core (conditional) priority schedulers. This is pretty much the same technology that in traditional QoS approaches is used to implement higher priority fast lanes. L4S adds a few heuristics to ameliorate this (like the coupling between the queues) but for these to work traffic in the L queue needs to respond properly to CE marks. So if we think about reasonably well-paced mischievous traffic that happens to be application limited to under the default 80 to 90% capacity share of the L-queue that ignores CE marks, this will pretty much get its way without suffering adverse effects. I predict that if you deploy an non-policed priority scheduler into the wild, people will find ways to abuse it. I wonder, what makes you believe that L4S is so special that abuse will not happen?

The shallow-buffer queue is not a fast lane [SM] Indeed it is not the shallow buffer but the underlaying priority scheduler, but IMHO that distinction is not all that important, the gist is l4s attempts to deploy a priority scheduler into the wild where the main admission control is whether a flow set the ECT(1) ECN codepoint. This is a rather risky proposition, and IMHO not helped by arguing that the priority scheduler itself is an implementation and not an architechtural feature of l4s... (l4s really needs a priority scheduler explicit or implicit, as that is exactly what it promises to do, prioritise ECT(1) packets over other packets and treat them to lower queuing delay, but I understand that I appear to be in the rough with this analysis).

and will only improve latency performance for endpoints that implement the appropriate algorithms. An endpoint that tries to "cheat" will just end up policed and will experience worse performance.

[SM] How? And what if that flow is well paced and stays below the l-queue capacity share, how can you assert that this flow will reliably get targeted by the policer? Keep in mind that queue protection has no concept of relative throughput of flows , but only looks at the queuing a flow causes. That is the goal of an attacker, likely getting an unfair throughput advantage is only policed indirectly. This is not what I would consider robust and reliable engineering...

Why would anyone go out of their way to use the shallow-buffer queue to get worse performance?

[SM] Again, what makes you so certain an attacker would get worse performance?

I don't think it is productive to rigorously define "shallow buffered" here. The exact buffer depth is a function of the algorithm and vendor implementation.

I also don't think it is necessary or helpful to try to solve for malicious actors here. Any malicious actor can fill up queues and crowd out other traffic simply by sending high rate UDP. Shallow buffers are not uniquely vulnerable here. On the contrary, there is no buffer so large that a malicious actor cannot easily fill it.

[SM] I gently disagree you can always opt to drop packets even before putting them into a queue.

Just my two cents...

Michael Overcash Principal Architect, Cox Communications @.***

From: Black, David @.> Sent: Tuesday, July 23, 2024 11:12 AM To: gwhiteCL/NQBdraft @.>; gwhiteCL/NQBdraft @.> Cc: Black, David @.>; tsvwg IETF list @.***> Subject: [EXTERNAL] [tsvwg] Re: [gwhiteCL/NQBdraft] Should traffic protection be mandatory to implement? (Issue #48)

[+tsvwg list]

I continue to disagree that traffic protection needs to be made mandatory to implement, and I have some suggestions on a way forward that provides a compromise. This overall direction looks promising, but the suggested compromise is not (yet) good enough. Significant work on the draft will be needed, specifically on items 1 and 4:

  1. Necessity: NQB is a shallow-buffered best-effort service. It is understood that performance is not guaranteed for any best-effort service. I understand the overall intent, and I'm fine with that as a high-level goal/direction. The problem is that in the -24 version of the draft, "shallow-buffered" is an all-but-undefined term.

To do better, the draft needs to provide a concrete specification of "shallow-buffered" and require that NQB implementations use shallow buffers. If this specification of "shallow-buffered" requirements is done well, it should lead to corresponding (hopefully minor) revisions of the incentives framework discussion that result in an acceptable resolution to points 2 and 3 on Incentives.

OTOH, the comment that "performance is not guaranteed for any best-effort service" appears to have missed the point. I definitely agree that the draft is not guaranteeing any performance for NQB traffic, but this line of reasoning is claiming to guarantee non-performance(!) for QB traffic that uses (abuses) the NQB service. Specifically, the claim is being made that a shallow-buffered NQB service provides a sufficient non-performance guarantee to ensure that QB traffic has nothing to gain (and quite a bit to lose) by using (abusing) the shallow-buffered NQB service. The detailed requirements for sufficiently shallow buffers that realize that non-performance guarantee need to be specified and mandated, e.g., in Section 5.1 of the draft.

  1. Security: The incentives above don't address malicious sources. While traffic protection is the remedy for this, some network environments have other ways to address malicious sources (e.g. only approved applications are deployed in the network, or traffic conditioning is performed at the network edge).

Proceeding in this direction ... if traffic protection is not mandatory to implement, then the draft will need to restrict NQB implementation and usage (using "MUST" and "MUST NOT" or equivalent RFC 2119 keywords) to network environments that have "other ways to address malicious sources." The nature and/or results of those "other ways" will need to be specified in a sufficiently concrete fashion that a network operator can readily determine whether or not her network has sufficient "other ways to address malicious sources."

Turning to the suggested compromise:

Specifically, the suggestion is that we address your concern about abuse of the code point by adding a mandatory requirement that NQB PHB implementations provide statistics that can be used by the network operator to detect whether abuse is occurring. These statistics could be as simple as packet and drop counters. That could work in combination with a solution to the "4. Security" problem suggested above. By themselves, requiring collection/provision of statistics is not sufficient to resolve the security problem.

Regarding the paragraph in 5.2 discussing situations where traffic protection is potentially not needed, we could rework the paragraph ... That would help ... after the security problem (4) is resolved (see above)..

The bottom line is that items 1 (e.g., What is the concrete specification of "shallow-buffered" ?) and 4 (e.g., What are other ways that are sufficient to address malicious sources?) need to be addressed.

Thanks, --David

From: gwhiteCL @.**@.>> Sent: Monday, July 22, 2024 9:03 PM To: gwhiteCL/NQBdraft @.**@.>> Cc: Black, David @.**@.>>; Mention @.**@.>> Subject: Re: [gwhiteCL/NQBdraft] Should traffic protection be mandatory to implement? (Issue #48)

[EXTERNAL EMAIL]

@dlb237 [github.com]https://urldefense.com/v3/__https:/github.com/dlb237__;!!LpKI!jyiVIyRb0wHGFj6E5pa6Rm73RYDbMxjO3w3_EPIu0Igv6c7N8-NWOQisrmDR8o9RxjsUqJKazSDQ4_HKgrTj_EGiYA$ I continue to disagree that traffic protection needs to be made mandatory to implement, and I have some suggestions on a way forward that provides a compromise. Here are some of the reasons why I disagree:

  1. Necessity: NQB is a shallow-buffered best-effort service. It is understood that performance is not guaranteed for any best-effort service. For example, the IETF doesn't mandate that implementations of the Default PHB provide mechanisms to police/prevent applications from inducing delay and/or loss.

  2. Incentives: As I wrote in #47 (comment) [github.com]https://urldefense.com/v3/__https:/github.com/gwhiteCL/NQBdraft/issues/47*issuecomment-2215318283__;Iw!!LpKI!jyiVIyRb0wHGFj6E5pa6Rm73RYDbMxjO3w3_EPIu0Igv6c7N8-NWOQisrmDR8o9RxjsUqJKazSDQ4_HKgrT6fDk_CQ$, even without traffic protection, if the NQB queue is configured as specified (i.e. with a shallow buffer), there is a disincentive for QB applications to mis-mark their traffic because they will see excessive packet drops. So, I disagree with your assertion that the incentives framework fundamentally depends on the presence of traffic protection. Traffic protection as defined in DOCSIS Queue Protection [ietf.org]https://urldefense.com/v3/__https:/www.ietf.org/archive/id/draft-briscoe-docsis-q-protection-07.html__;!!LpKI!jyiVIyRb0wHGFj6E5pa6Rm73RYDbMxjO3w3_EPIu0Igv6c7N8-NWOQisrmDR8o9RxjsUqJKazSDQ4_HKgrSwpL2vsw$ arguably provides less of a disincentive for inappropriate marking than would be the case in the absence of QP, because it results in significantly less packet loss for the offending application.

  3. Incentives: Incentives apply more broadly than on a hop-by-hop basis, and also generally apply more broadly than on a path-by-path basis. In other words, a QB application developer would (generally) need to make a decision as to whether to mark their packets as NQB without specific knowledge whether the traffic would be subjected to traffic protection or not. So, again, I disagree with the assertion that the incentives framework fundamentally depends on the presence of traffic protection.

  4. Security: The incentives above don't address malicious sources. While traffic protection is the remedy for this, some network environments have other ways to address malicious sources (e.g. only approved applications are deployed in the network, or traffic conditioning is performed at the network edge).

I definitely agree that traffic protection is the preferred implementation, but I disagree that it needs to be made mandatory to implement.

As a compromise, I'd like to suggest that we strengthen the recommendation around the implementation of traffic protection, and eliminate some of the language that seems of offer rationales to ignore that recommendation, futher I'd like to suggest that we mandate some mechanism that a network operator can use to detect and avoid abuse.

Specifically, the suggestion is that we address your concern about abuse of the code point by adding a mandatory requirement that NQB PHB implementations provide statistics that can be used by the network operator to detect whether abuse is occurring. These statistics could be as simple as packet and drop counters. This requirement would ensure that operators who configure the NQB PHB have the ability to track the amount of packet drop that is occurring due to traffic overrunning the shallow buffer, and then take action if they feel as though the PHB is causing more issues than it is solving in their environment. Those actions could include disabling the PHB, identifying and dealing with the sources of malicious traffic directly, or pursuing a feature request with the equipment manufacturer to add a traffic protection function.

In addition, I think we can delete the words in section 10: "but recognizes that other options might be more desirable in certain situations." so that the recommendation to implement traffic protection isn't watered down.

Regarding the paragraph in 5.2 discussing situations where traffic protection is potentially not needed, we could rework the paragraph to emphasize that the decision by an implementer to not implement traffic protection might limit the deployment/usage of their NQB PHB implementation to a small subset of potential sitations, and it would put the onus on the operator to monitor usage and take remediations manually rather than automatically dealing with misbehaving traffic. We can also add text to more fully specify the implications of ignoring the recommendation. That, I think, would strengthen the SHOULD as opposed to offering rationales for ignoring it.

- Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/gwhiteCL/NQBdraft/issues/48*issuecomment-2244060936__;Iw!!LpKI!jyiVIyRb0wHGFj6E5pa6Rm73RYDbMxjO3w3_EPIu0Igv6c7N8-NWOQisrmDR8o9RxjsUqJKazSDQ4_HKgrRJn3skGw$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AB2VULQNPSLLSSFSGIZRZP3ZNWTVRAVCNFSM6AAAAABKRH2VICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBUGA3DAOJTGY__;!!LpKI!jyiVIyRb0wHGFj6E5pa6Rm73RYDbMxjO3w3_EPIu0Igv6c7N8-NWOQisrmDR8o9RxjsUqJKazSDQ4_HKgrRNUJ0Ebg$. You are receiving this because you were mentioned.Message ID: @.**@.>>

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

dlb237 commented 1 month ago

I wonder, what makes you believe that L4S is so special that abuse will not happen?

Abuse needs an incentive. Can anyone think of a way to abuse L4S that provides a benefit to the abuser? (Other than the intended benefit of improved latency of course.)

Michael Overcash Principal Architect, CPE Premises Engineering, Cox Communications

dlb237 commented 1 month ago

On 24 July 2024 14:49:26 CEST, "Overcash, Michael (CCI-Atlanta)" @.***> wrote:

I wonder, what makes you believe that L4S is so special that abuse will not happen?

Abuse needs an incentive. Can anyone think of a way to abuse L4S that provides a benefit to the abuser? (Other than the intended benefit of improved latency of course.)/

[SM] For a single flow getting access to (by default) 80% of link capacity as reward for ignoring CE marks is IMHO a pretty clear benefit... Really, the idea that packet scheduling with a work conserving scheduler can be anything but a zero sum game requires a very good explanation (that so far l4s/nqb proponents have not offered)... But a zero sum game really implies the advantage of one class comes at the expense of the other. L4S 'solution' to this challenge is to rely on all flows to play nice and pretend there is no incentive to not play nice. You might consider this a robust and reliably engineered solution....

I wonder what the Nash equilibria of the L4S game actually are and whether they support the hypothesis that there is no incentive to abuse... (I am not volunteering a game theoretic analysis, but I note this would be a way to massively support the incentive argument for both NQB and L4S.)

Regards Sebastian

-- Michael Overcash Principal Architect, CPE Premises Engineering, Cox Communications

-----Original Message----- From: Sebastian Moeller @.> Sent: Wednesday, July 24, 2024 2:41 AM To: @.; Overcash, Michael (CCI-Atlanta) @.>; Black, David @.>; gwhiteCL/NQBdraft @.>; gwhiteCL/NQBdraft @.> Cc: Black, David @.>; tsvwg IETF list @.> Subject: Re: [tsvwg] Re: [EXTERNAL] Re: [gwhiteCL/NQBdraft] Should traffic protection be mandatory to implement? (Issue #48)

See [SM] below...

On 23 July 2024 21:52:11 CEST, "Overcash, Michael (CCI-Atlanta)" @.***> wrote:

I don't think you've really fully addressed Greg's main point here.

"if the NQB queue is configured as specified (i.e. with a shallow buffer), there is a disincentive for QB applications to mis-mark their traffic because they will see excessive packet drops."

Traditional QoS/Priority approaches created an incentive to cheat by creating a "fast lane" for latency sensitive services. This is emphatically not how L4S and other similar AQM based methods work.

[SM] Both DualQ and the low latency DOCSIS scheduler it was based upon are at their core (conditional) priority schedulers. This is pretty much the same technology that in traditional QoS approaches is used to implement higher priority fast lanes. L4S adds a few heuristics to ameliorate this (like the coupling between the queues) but for these to work traffic in the L queue needs to respond properly to CE marks. So if we think about reasonably well-paced mischievous traffic that happens to be application limited to under the default 80 to 90% capacity share of the L-queue that ignores CE marks, this will pretty much get its way without suffering adverse effects. I predict that if you deploy an non-policed priority scheduler into the wild, people will find ways to abuse it. I wonder, what makes you believe that L4S is so special that abuse will not happen?

The shallow-buffer queue is not a fast lane [SM] Indeed it is not the shallow buffer but the underlaying priority scheduler, but IMHO that distinction is not all that important, the gist is l4s attempts to deploy a priority scheduler into the wild where the main admission control is whether a flow set the ECT(1) ECN codepoint. This is a rather risky proposition, and IMHO not helped by arguing that the priority scheduler itself is an implementation and not an architechtural feature of l4s... (l4s really needs a priority scheduler explicit or implicit, as that is exactly what it promises to do, prioritise ECT(1) packets over other packets and treat them to lower queuing delay, but I understand that I appear to be in the rough with this analysis).

and will only improve latency performance for endpoints that implement the appropriate algorithms. An endpoint that tries to "cheat" will just end up policed and will experience worse performance.

[SM] How? And what if that flow is well paced and stays below the l-queue capacity share, how can you assert that this flow will reliably get targeted by the policer? Keep in mind that queue protection has no concept of relative throughput of flows , but only looks at the queuing a flow causes. That is the goal of an attacker, likely getting an unfair throughput advantage is only policed indirectly. This is not what I would consider robust and reliable engineering...

Why would anyone go out of their way to use the shallow-buffer queue to get worse performance?

[SM] Again, what makes you so certain an attacker would get worse performance?

I don't think it is productive to rigorously define "shallow buffered" here. The exact buffer depth is a function of the algorithm and vendor implementation.

I also don't think it is necessary or helpful to try to solve for malicious actors here. Any malicious actor can fill up queues and crowd out other traffic simply by sending high rate UDP. Shallow buffers are not uniquely vulnerable here. On the contrary, there is no buffer so large that a malicious actor cannot easily fill it.

[SM] I gently disagree you can always opt to drop packets even before putting them into a queue.

Just my two cents...

Michael Overcash Principal Architect, Cox Communications @.***

From: Black, David @.> Sent: Tuesday, July 23, 2024 11:12 AM To: gwhiteCL/NQBdraft @.>; gwhiteCL/NQBdraft @.> Cc: Black, David @.>; tsvwg IETF list @.***> Subject: [EXTERNAL] [tsvwg] Re: [gwhiteCL/NQBdraft] Should traffic protection be mandatory to implement? (Issue #48)

[+tsvwg list]

I continue to disagree that traffic protection needs to be made mandatory to implement, and I have some suggestions on a way forward that provides a compromise. This overall direction looks promising, but the suggested compromise is not (yet) good enough. Significant work on the draft will be needed, specifically on items 1 and 4:

  1. Necessity: NQB is a shallow-buffered best-effort service. It is understood that performance is not guaranteed for any best-effort service. I understand the overall intent, and I'm fine with that as a high-level goal/direction. The problem is that in the -24 version of the draft, "shallow-buffered" is an all-but-undefined term.

To do better, the draft needs to provide a concrete specification of "shallow-buffered" and require that NQB implementations use shallow buffers. If this specification of "shallow-buffered" requirements is done well, it should lead to corresponding (hopefully minor) revisions of the incentives framework discussion that result in an acceptable resolution to points 2 and 3 on Incentives.

OTOH, the comment that "performance is not guaranteed for any best-effort service" appears to have missed the point. I definitely agree that the draft is not guaranteeing any performance for NQB traffic, but this line of reasoning is claiming to guarantee non-performance(!) for QB traffic that uses (abuses) the NQB service. Specifically, the claim is being made that a shallow-buffered NQB service provides a sufficient non-performance guarantee to ensure that QB traffic has nothing to gain (and quite a bit to lose) by using (abusing) the shallow-buffered NQB service. The detailed requirements for sufficiently shallow buffers that realize that non-performance guarantee need to be specified and mandated, e.g., in Section 5.1 of the draft.

  1. Security: The incentives above don't address malicious sources. While traffic protection is the remedy for this, some network environments have other ways to address malicious sources (e.g. only approved applications are deployed in the network, or traffic conditioning is performed at the network edge).

Proceeding in this direction ... if traffic protection is not mandatory to implement, then the draft will need to restrict NQB implementation and usage (using "MUST" and "MUST NOT" or equivalent RFC 2119 keywords) to network environments that have "other ways to address malicious sources." The nature and/or results of those "other ways" will need to be specified in a sufficiently concrete fashion that a network operator can readily determine whether or not her network has sufficient "other ways to address malicious sources."

Turning to the suggested compromise:

Specifically, the suggestion is that we address your concern about abuse of the code point by adding a mandatory requirement that NQB PHB implementations provide statistics that can be used by the network operator to detect whether abuse is occurring. These statistics could be as simple as packet and drop counters. That could work in combination with a solution to the "4. Security" problem suggested above. By themselves, requiring collection/provision of statistics is not sufficient to resolve the security problem.

Regarding the paragraph in 5.2 discussing situations where traffic protection is potentially not needed, we could rework the paragraph ... That would help ... after the security problem (4) is resolved (see above)..

The bottom line is that items 1 (e.g., What is the concrete specification of "shallow-buffered" ?) and 4 (e.g., What are other ways that are sufficient to address malicious sources?) need to be addressed.

Thanks, --David

From: gwhiteCL @.**@.>> Sent: Monday, July 22, 2024 9:03 PM To: gwhiteCL/NQBdraft @.**@.>> Cc: Black, David @.**@.>>; Mention @.**@.>> Subject: Re: [gwhiteCL/NQBdraft] Should traffic protection be mandatory to implement? (Issue #48)

[EXTERNAL EMAIL]

@dlb237 [github.com]https://urldefense.com/v3/__https:/github.com/dlb237__;!!LpKI!jyiVIyRb0wHGFj6E5pa6Rm73RYDbMxjO3w3_EPIu0Igv6c7N8-NWOQisrmDR8o9RxjsUqJKazSDQ4_HKgrTj_EGiYA$ I continue to disagree that traffic protection needs to be made mandatory to implement, and I have some suggestions on a way forward that provides a compromise. Here are some of the reasons why I disagree:

  1. Necessity: NQB is a shallow-buffered best-effort service. It is understood that performance is not guaranteed for any best-effort service. For example, the IETF doesn't mandate that implementations of the Default PHB provide mechanisms to police/prevent applications from inducing delay and/or loss.

  2. Incentives: As I wrote in #47 (comment) [github.com]https://urldefense.com/v3/__https:/github.com/gwhiteCL/NQBdraft/issues/47*issuecomment-2215318283__;Iw!!LpKI!jyiVIyRb0wHGFj6E5pa6Rm73RYDbMxjO3w3_EPIu0Igv6c7N8-NWOQisrmDR8o9RxjsUqJKazSDQ4_HKgrT6fDk_CQ$, even without traffic protection, if the NQB queue is configured as specified (i.e. with a shallow buffer), there is a disincentive for QB applications to mis-mark their traffic because they will see excessive packet drops. So, I disagree with your assertion that the incentives framework fundamentally depends on the presence of traffic protection. Traffic protection as defined in DOCSIS Queue Protection [ietf.org]https://urldefense.com/v3/__https:/www.ietf.org/archive/id/draft-briscoe-docsis-q-protection-07.html__;!!LpKI!jyiVIyRb0wHGFj6E5pa6Rm73RYDbMxjO3w3_EPIu0Igv6c7N8-NWOQisrmDR8o9RxjsUqJKazSDQ4_HKgrSwpL2vsw$ arguably provides less of a disincentive for inappropriate marking than would be the case in the absence of QP, because it results in significantly less packet loss for the offending application.

  3. Incentives: Incentives apply more broadly than on a hop-by-hop basis, and also generally apply more broadly than on a path-by-path basis. In other words, a QB application developer would (generally) need to make a decision as to whether to mark their packets as NQB without specific knowledge whether the traffic would be subjected to traffic protection or not. So, again, I disagree with the assertion that the incentives framework fundamentally depends on the presence of traffic protection.

  4. Security: The incentives above don't address malicious sources. While traffic protection is the remedy for this, some network environments have other ways to address malicious sources (e.g. only approved applications are deployed in the network, or traffic conditioning is performed at the network edge).

I definitely agree that traffic protection is the preferred implementation, but I disagree that it needs to be made mandatory to implement.

As a compromise, I'd like to suggest that we strengthen the recommendation around the implementation of traffic protection, and eliminate some of the language that seems of offer rationales to ignore that recommendation, futher I'd like to suggest that we mandate some mechanism that a network operator can use to detect and avoid abuse.

Specifically, the suggestion is that we address your concern about abuse of the code point by adding a mandatory requirement that NQB PHB implementations provide statistics that can be used by the network operator to detect whether abuse is occurring. These statistics could be as simple as packet and drop counters. This requirement would ensure that operators who configure the NQB PHB have the ability to track the amount of packet drop that is occurring due to traffic overrunning the shallow buffer, and then take action if they feel as though the PHB is causing more issues than it is solving in their environment. Those actions could include disabling the PHB, identifying and dealing with the sources of malicious traffic directly, or pursuing a feature request with the equipment manufacturer to add a traffic protection function.

In addition, I think we can delete the words in section 10: "but recognizes that other options might be more desirable in certain situations." so that the recommendation to implement traffic protection isn't watered down.

Regarding the paragraph in 5.2 discussing situations where traffic protection is potentially not needed, we could rework the paragraph to emphasize that the decision by an implementer to not implement traffic protection might limit the deployment/usage of their NQB PHB implementation to a small subset of potential sitations, and it would put the onus on the operator to monitor usage and take remediations manually rather than automatically dealing with misbehaving traffic. We can also add text to more fully specify the implications of ignoring the recommendation. That, I think, would strengthen the SHOULD as opposed to offering rationales for ignoring it.

- Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/gwhiteCL/NQBdraft/issues/48*issuecomment-2244060936__;Iw!!LpKI!jyiVIyRb0wHGFj6E5pa6Rm73RYDbMxjO3w3_EPIu0Igv6c7N8-NWOQisrmDR8o9RxjsUqJKazSDQ4_HKgrRJn3skGw$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AB2VULQNPSLLSSFSGIZRZP3ZNWTVRAVCNFSM6AAAAABKRH2VICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBUGA3DAOJTGY__;!!LpKI!jyiVIyRb0wHGFj6E5pa6Rm73RYDbMxjO3w3_EPIu0Igv6c7N8-NWOQisrmDR8o9RxjsUqJKazSDQ4_HKgrRNUJ0Ebg$. You are receiving this because you were mentioned.Message ID: @.<mailto:gwhiteCL/NQBd @.>>

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

dlb237 commented 1 month ago

[From David:]

I understand the overall intent, and I'm fine with that as a high-level goal/direction. The problem is that in the -24 version of the draft, "shallow-buffered" is an all-but-undefined term. …. The detailed requirements for sufficiently shallow buffers that realize that non-performance guarantee need to be specified and mandated, e.g., in Section 5.1 of the draft.

Ok, I think this is solvable. Here is a proposal: The NQB queue MUST have a buffer size that is significantly smaller than the buffer provided for Default traffic. It is RECOMMENDED to configure an NQB buffer size less than or equal to 10 ms at the shared NQB/Default egress rate.

That's an improvement, but "significantly smaller" is almost as undefined as "shallow-buffered" and the size in octets of a 10ms queue varies dramatically by egress rate.

Trying a different perspective ... A simple way to attempt abuse of an NQB queue is to point a TCP connection at the queue and set TCP loose. If that TCP is using IW10, then drops in that initial window would be a good initial disincentive for that sort of abuse (aside: this is all but assuming that the IW10 packets are not paced, which may not be a good assumption). Attempting some quick back of the envelope math for my 50 Mbit home service, 15kB (10 x 1.5kB typical MTU) x 8 = 120 kbits. 120k/50M = 2.4ms at a 50Mbit line rate, which is less than 10ms. If the NQB egress rate were capped at 10% of the line rate, then 15kB takes 24ms, so a 10ms bottleneck buffer size ought to cause multiple drops which will get that TCP's attention ;-). OTOH, if the service rate is increased to 1 Gbit (available from my ISP), then that 24ms gets divided by 20, resulting in 1.2ms for 15kB which easily fits in a 10ms buffer.

At a minimum, I hope this illustrates that a fixed time period is not a great way to size NQB queues because the link/egress rates involved vary by a number of orders of magnitude. There's more to be done from here to get to a complete solution.

Greg – two questions (with more doubtless to come):

Thanks, --David

dlb237 commented 1 month ago

Abuse needs an incentive. Can anyone think of a way to abuse L4S that provides a benefit to the abuser? (Other than the intended benefit of improved latency of course.)/

[SM] For a single flow getting access to (by default) 80% of link capacity as reward for ignoring CE marks is IMHO a pretty clear benefit...

[JL] Wouldn't this application behaviour be readily identifiable? Also, if the app in question is queue building, ISTM that marking for NQB with a shallow buffer would not improve your app QoE.

dlb237 commented 1 month ago

Trying a different perspective ... A simple way to attempt abuse of an NQB queue is to point a TCP connection at the queue and set TCP loose.

[JL] How does this differ from – on today’s internet – pointing a TCP connection at a user and flooding it with a high volume of packets? ISTM we have an issue of DoS and DDoS today with regular best effort flows.

JL

dlb237 commented 1 month ago

I also don’t think it is necessary or helpful to try to solve for malicious actors here. Any malicious actor can fill up queues and crowd out other traffic simply by sending high rate UDP. Shallow buffers are not uniquely vulnerable here.

That's the wrong class of malicious actor. Theft of service is a different attack (with different malicious actor behavior) from denial of service. The draft's incentives framework is making strong claims that theft of service attempts are sufficiently counterproductive for the thief so as to make other countermeasures (e.g., traffic protection) unnecessary. The fact that all the buffers, e.g., both best effort and NQB, can be overwhelmed by a sufficiently large denial of service attack has almost no relevance to that theft of service concern.

[JL] So “theft of service” in this use case would be some software on the home network trying to achieve high throughput on an upstream (outbound) basis to the internet? There are already ways that users can sort of do similar things – such as using their own home router and using QoS prioritization to advantage certain LAN clients (e.g., game console) and to give certain devices or users more bandwidth than others. This is the user making decisions for how to use their connection.

[JL] So ISTM there are parallels in this use case to what many customers do today. They are provided a certain bandwidth (and may have volumetric usage policies) – shouldn’t it be up the the user to decide how to use that? Wouldn’t the user just be ‘stealing’ from themselves?

dlb237 commented 1 month ago

In addition, for the access device the overall rate limiting is independent of the queue ... it is an overall aggregate limit. (And the 80% sounds backwards ... at least for Cox our initial weighting will favor Classic flows because they are the flows generally used for bulk transfer.)

Michael Overcash Principal Architect, CPE Premises Engineering

dlb237 commented 1 month ago

[JL] So “theft of service” in this use case would be some software on the home network trying to achieve high throughput on an upstream (outbound) basis to the internet?

That's an example, but it's neither the only one nor a particularly relevant one for this discussion.

A better example would be an NQB queue in ISP access network equipment a few hops away from the subscriber where the traffic in the NQB queue is from multiple users.

Thanks, --David

gwhiteCL commented 1 month ago

@dlb237 wrote:

At a minimum, I hope this illustrates that a fixed time period is not a great way to size NQB queues because the link/egress rates involved vary by a number of orders of magnitude. There's more to be done from here to get to a complete solution.

No, it doesn't illustrate that at all! Look, there is nothing to "steal" in this "theft of service" fantasy that you've created. The sum of NQB + Default gives a fixed service rate. One queue has a shallow buffer and the other has a deep buffer. If an operator or a vendor wants to configure those as 10 ms and 200 ms, or 5 ms and 250ms, or 30 ms and 300 ms nothing gets broken! The TCP flow you are concerned about isn't "stealing" anything! Classic TCP protocols (and many other bursty senders) need the bottleneck to provide a deep buffer in order to get good performance - often it needs to be equivalent to the base RTT, which is in time units. Other applications don't need a deep buffer and would get better performance if they weren't subjected to the latency/jitter caused by the QB applications.

I'm afraid you are going to have to come up with much more convincing arguments than this to sway me that what I've proposed doesn't fully and completely resolve your WGLC comment.

gwhiteCL commented 1 month ago

[JL] So “theft of service” in this use case would be some software on the home network trying to achieve high throughput on an upstream (outbound) basis to the internet?

That's an example, but it's neither the only one nor a particularly relevant one for this discussion. A better example would be an NQB queue in ISP access network equipment a few hops away from the subscriber where the traffic in the NQB queue is from multiple users. Thanks, --David @dlb237

How is that not relevant?! How is it not the most relevant example?! The scenario Jason is referring to describes 100% of the current worldwide deployment of this PHB, and the vast majority of access network architectures.

You are confusing access network and core network!

This conversation is getting bonkers.

dlb237 commented 1 month ago

Are you willing to restrict (MUST) the use and deployment of this PHB to single-user equipment?

Thanks, --David

dlb237 commented 1 month ago

A better example would be an NQB queue in ISP access network equipment a few hops away from the subscriber where the traffic in the NQB queue is from multiple users.

[JL] First hop access gear would typically still have per user/device/home flow boundaries – for example limiting home-1 to 300 Mbps per their service tier and home-2 to 1 Gbps, home-3 to 500 Mbps, and so on. Those users cannot exceed their service plan limits and take bandwidth from other users. If they consistently have high utilization that causes congestion at the aggregation point, the capacity monitoring systems would automatically trigger a normal capacity augmentation process. The second hop, just after that access aggregator would typically have extraordinary capacity and never experience congestion unless there was something extraordinary like one or more fiber cuts in a regional network.

[JL] In any case, the user/device/home cannot exceed the bandwidth they have been provisioned. And if they do use a lot of their provisioned bandwidth, that seems fine – as it is what they have paid for.

dlb237 commented 1 month ago

On 24 July 2024 17:14:54 CEST, "Livingood, Jason" @.***> wrote:

Abuse needs an incentive. Can anyone think of a way to abuse L4S that provides a benefit to the abuser? (Other than the intended benefit of improved latency of course.)/

[SM] For a single flow getting access to (by default) 80% of link capacity as reward for ignoring CE marks is IMHO a pretty clear benefit...

[JL] Wouldn't this application behaviour be readily identifiable?

[SM] Only if the identifying entity maintains per flow throughput information... (or per flow queues and is willing to search and drop/mark from the fullest queue). Queue protection, maintains a per flow queuing score, but will not search/mark from the fullest microflow...

Also, if the app in question is queue building, ISTM that marking for NQB with a shallow buffer would not improve your app QoE.

[SM] Well a reasonably well paced application that is application limited to <= the l-queue priority share is going to do quite well with a shallow buffer... Worst case, add a FEC layer like UDPspeeder and just gloss over occasional losses/reorderings... but that level of sophistication seems not be necessary to exploit an unpoliced priority scheduler.

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

dlb237 commented 1 month ago

[JL] First hop access gear would typically still have per user/device/home flow boundaries

Would it be reasonable to specify those boundaries in a bit more detail and limit usage of this PHB to equipment that enforces such boundaries?

If that were to be done, then ...

[JL] In any case, the user/device/home cannot exceed the bandwidth they have been provisioned.

... would be a fine "abuser can only shoot itself in the foot" conclusion to most of the security concerns because the boundaries prevent damage to others.

Thanks, --David

dlb237 commented 1 month ago

At a minimum, I hope this illustrates that a fixed time period is not a great way to size NQB queues because the link/egress rates involved vary by a number of orders of magnitude. There's more to be done from here to get to a complete solution.

No, it doesn't illustrate that at all! Look, there is nothing to "steal" in this "theft of service" fantasy that you've created. The sum of NQB + Default gives a fixed service rate. One queue has a shallow buffer and the other has a deep buffer. If an operator or a vendor wants to configure those as 10 ms and 200 ms, or 5 ms and 250ms, or 30 ms and 300 ms nothing gets broken! The TCP flow you are concerned about isn't "stealing" anything! Classic TCP protocols (and many other bursty senders) need the bottleneck to provide a deep buffer in order to get good performance - often it needs to be equivalent to the base RTT, which is in time units. Other applications don't need a deep buffer and would get better performance if they weren't subjected to the latency/jitter caused by the QB applications.

I'm afraid you are going to have to come up with much more convincing arguments than this to sway me that what I've proposed doesn't fully and completely resolve your WGLC comment.

I think this is mostly covered by my response to Jason on the list on limiting use of this PHB to equipment that can enforce service rate boundaries, at least between users.

Thanks, --David

dlb237 commented 1 month ago

[JL] First hop access gear would typically still have per user/device/home flow boundaries

Would it be reasonable to specify those boundaries in a bit more detail and limit usage of this PHB to equipment that enforces such boundaries?

If that were to be done, then ...

[JL] In any case, the user/device/home cannot exceed the bandwidth they have been provisioned.

... would be a fine "abuser can only shoot itself in the foot" conclusion to most of the security concerns because the boundaries prevent damage to others.

[JL] Would that be best in this draft or a more generalized anti-DoS-resilience sort of draft, since this behavior can occur irrespective of whether NQB is used? It could be a very short BCP, basically saying it is best practice in an access network to enforce per-user/device/home bandwidth limits so that one user cannot exceed their provisioned bandwidth and negatively affect other users. On the other hand, it seems obvious that if you have provisioned X bandwidth that it would be impossible to exceed that provisioned bandwidth.

dlb237 commented 1 month ago

David,

I think you are missing the point.

In links that today provide a single, deep-buffered queue for best-effort Default traffic, there may or may not exist protection mechanisms that prevent individual application instances (or individual users) from consuming too much bandwidth, or causing too much delay, or causing too much packet loss to the detriment of other application instances (or other users). These mechanisms might be good to have in place in certain cases, but they aren’t always necessary (and they certainly aren’t mandated by the IETF).

NQB doesn’t change this.

Greg

dlb237 commented 1 month ago

[JL] In any case, the user/device/home cannot exceed the bandwidth they have been provisioned. ... would be a fine "abuser can only shoot itself in the foot" conclusion to most of the security concerns because the boundaries prevent damage to others.

[JL] Would that be best in this draft or a more generalized anti-DoS-resilience sort of draft, since this behavior can occur irrespective of whether NQB is used? Something would need to be stated in this draft in order to use presence of this class of provisioning and boundary enforcement functionality in support of the "abuser can only shoot itself in the foot" conclusion to most of the security concerns. OTOH, a complete explanation of the possible forms of this functionality and discussion of access network architecture is likely not necessary.

On the other hand, it seems obvious that if you have provisioned X bandwidth that it would be impossible to exceed that provisioned bandwidth. What is not obvious is the limitation of this PHB to equipment that can do that class of bandwidth provisioning and enforcement at user/device/home granularity – that would need to be stated.

Thanks, --David

dlb237 commented 1 month ago

In links that today provide a single, deep-buffered queue for best-effort Default traffic, there may or may not exist protection mechanisms that prevent individual application instances (or individual users) from consuming too much bandwidth, or causing too much delay, or causing too much packet loss to the detriment of other application instances (or other users). These mechanisms might be good to have in place in certain cases, but they aren’t always necessary (and they certainly aren’t mandated by the IETF).

NQB doesn’t change this.

If NQB is effectively an alternate name for Default, then why does the draft contain extensive discussion of and strong assertions about the absence of incentives to mismark Default QB traffic as NQB?

Thanks, --David

dlb237 commented 1 month ago

On 24 July 2024 17:24:22 CEST, "Livingood, Jason" @.***> wrote:

I also don’t think it is necessary or helpful to try to solve for malicious actors here. Any malicious actor can fill up queues and crowd out other traffic simply by sending high rate UDP. Shallow buffers are not uniquely vulnerable here.

That's the wrong class of malicious actor. Theft of service is a different attack (with different malicious actor behavior) from denial of service. The draft's incentives framework is making strong claims that theft of service attempts are sufficiently counterproductive for the thief so as to make other countermeasures (e.g., traffic protection) unnecessary. The fact that all the buffers, e.g., both best effort and NQB, can be overwhelmed by a sufficiently large denial of service attack has almost no relevance to that theft of service concern.

[JL] So “theft of service” in this use case would be some software on the home network trying to achieve high throughput on an upstream (outbound) basis to the internet? There are already ways that users can sort of do similar things – such as using their own home router and using QoS prioritization to advantage certain LAN clients (e.g., game console) and to give certain devices or users more bandwidth than others. This is the user making decisions for how to use their connection.

[SM] In the case of an mischieviuos application however that tries to achieve more than its equitable share, the user is not really in control of the decision, unlike when actively configuring priority for an application the users deems important... So I do not think these are analog situations, as user intent matters.

[JL] So ISTM there are parallels in this use case to what many customers do today. They are provided a certain bandwidth (and may have volumetric usage policies) – shouldn’t it be up the the user to decide how to use that? Wouldn’t the user just be ‘stealing’ from themselves?

[SM] But by that token an application, that without user intervention uses NQB or L4S is different in that it does not let the user decide.

Given the lax consideration of security I start to wonder whether the best policy for networks under my control is not simply to drop all packets carrying NQB and ECT(1), and then patiently wait for things to get fixed (or not) from a pure passive spectator position instead of from a victim's view....

Regards Sebastian

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

dlb237 commented 1 month ago

Hi Michael,

On 24 July 2024 17:36:02 CEST, "Overcash, Michael (CCI-Atlanta)" @.***> wrote:

In addition, for the access device the overall rate limiting is independent of the queue ... it is an overall aggregate limit. (And the 80% sounds backwards ... at least for Cox our initial weighting will favor Classic flows because they are the flows generally used for bulk transfer.)

[SM] So are you at liberty which priority share you are going to assign to the 2 classes in your CMTSs? I assume you will use the low latency docsis framework/tools to di that, so will you only use NQB or also L4S and if both, it would be quite interesting to learn what effect a smaller priority share will have for l4s traffic?

-- Michael Overcash Principal Architect, CPE Premises Engineering

-----Original Message----- From: Livingood, Jason @.> Sent: Wednesday, July 24, 2024 11:15 AM To: Sebastian Moeller @.>; Overcash, Michael (CCI-Atlanta) @.>; @.; Black, David @.>; gwhiteCL/NQBdraft @.>; gwhiteCL/NQBdraft @.***> Subject: Re: [tsvwg] Re: [EXTERNAL] Re: [gwhiteCL/NQBdraft] Should traffic protection be mandatory to implement? (Issue #48)

Abuse needs an incentive. Can anyone think of a way to abuse L4S that provides a benefit to the abuser? (Other than the intended benefit of improved latency of course.)/

[SM] For a single flow getting access to (by default) 80% of link capacity as reward for ignoring CE marks is IMHO a pretty clear benefit...

[JL] Wouldn't this application behaviour be readily identifiable? Also, if the app in question is queue building, ISTM that marking for NQB with a shallow buffer would not improve your app QoE.

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

dlb237 commented 1 month ago

[SM] In the case of an mischieviuos application however that tries to achieve more than its equitable share, the user is not really in control of the decision, unlike when actively configuring priority for an application the users deems important...

The user can uninstall the software. If I install some app and it suddenly makes my computer or network perform poorly, I will uninstall it. This is the case before and after NQB, as we have been saying.

Scenarios I have heard:

  1. Client software is malicious and tries to flood the LL queue.

    • Unique to LL? No: client software today could create a disruptive flood of traffic.
  2. User tries to maliciously flood the LL queue in their access network connection.

    • Unique to LL? No: users can maliciously flood their connection today.
  3. ?

JL

dlb237 commented 1 month ago

Hi Jason,

On 24 July 2024 23:34:33 CEST, "Livingood, Jason" @.***> wrote:

[SM] In the case of an mischieviuos application however that tries to achieve more than its equitable share, the user is not really in control of the decision, unlike when actively configuring priority for an application the users deems important...

The user can uninstall the software. If I install some app and it suddenly makes my computer or network perform poorly, I will uninstall it.

[SM] I am not doubting that this is what the typical IETFer can and will do, but I am sure most of e.g. my wider family will not be able to root cause this.

This is the case before and after NQB, as we have been saying.

[SM] Not exactly, NQB/L4S will make this much simpler, exactly because at their core they offer an unpoliced priority scheduler... while pretending there is nothing to gain from abusing that....

Scenarios I have heard:

  1. Client software is malicious and tries to flood the LL queue.

[SM] No, not flooding indiscriminately, the trick is to stay below the priority capacity share of the NQB/L-queue...

  • Unique to LL? No: client software today could create a disruptive flood of traffic.

[SM] Flooding is denial of service, but that is not the issue here.

  1. User tries to maliciously flood the LL queue in their access network connection.
    • Unique to LL? No: users can maliciously flood their connection today.

[SM] Again flooding is not the key to the castle here. This is about the claim, that it is in the best interest of each application to follow the nqb/l4s recommendations to avoid self harm. This is IMHO so far an unproven hypothesis to phrase it positively.

  1. ?

[SM] Application tries to hog more then its user intended/expected share of capacity for what ever purposes (from just expediting its own bulk down/uploads to running a tunneled p2p instance at the users expense to come up with a phantasy idea how to abuse access capacity). And for this some of the users might even be okay with that behavior (e.g. the gamer who really wants that 100GB update ASAP) but what about the othe users that expect their reasonable capacity shar for e.g. homeoffice/remote work/video conferences? Fast lanes are really only okay if opted-in by the respective network admin, and priority scheduling is creating a "fast lane" whether we call it that or not.

JL

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

dlb237 commented 1 month ago

DB wrote:

If NQB is effectively an alternate name for Default, then why does the draft contain extensive discussion of and strong assertions about the absence of incentives to mismark Default QB traffic as NQB?

Precisely because of that property. Since there is no priority elevation or reserved bandwidth, but only a shallow-buffered queue and otherwise Default treatment, there isn’t an incentive for applications that need a deep buffer to try to get anything “special” by picking the wrong codepoint. Also, to be clear, the draft (particularly with the recent proposed changes*) talks about minimizing incentives , rather than making strong assertions about their absence.

-Greg

gwhiteCL commented 1 month ago

@dlb237 you've not provided a convincing argument that more is needed to resolve this issue than what has been proposed so far (plus the proposed resolutions to #46 and #47).

Summarizing the current proposed resolution to this issue:

  1. Traffic protection remains a SHOULD, and remains the preferred implementation.
  2. We delete the words in section 10: but recognizes that other options might be more desirable in certain situations. so that the recommendation to implement traffic protection isn't watered down.
  3. We modify the paragraph in 5.2 that discusses situations where traffic protection is potentially not needed, to instead frame it such that the decision by an implementer to not implement traffic protection might limit the deployment/usage of their NQB PHB implementation to a smaller set of use cases, and it would put the onus on the operator to monitor usage and take remediations manually rather than automatically dealing with misbehaving traffic. In this, we include language to more fully describe the implications of ignoring the recommendation to implement traffic protection.
  4. We add a mandatory requirement that NQB PHB implementations provide statistics that can be used by the network operator to detect whether abuse is occurring (e.g. packet and drop counters). This requirement would ensure that operators who configure the NQB PHB have the ability to track the amount of packet drop that is occurring due to traffic overrunning the shallow buffer, and then take action if they feel as though the PHB is causing more issues than it is solving in their environment. Those actions could include disabling the PHB, identifying and dealing with the sources of malicious traffic directly, enabling traffic protection if it is available, or pursuing a feature request with the equipment manufacturer to add a traffic protection function if it isn't currently available.
  5. We change this SHOULD to a MUST: The NQB queue SHOULD have a buffer size that is significantly smaller than the buffer provided for Default traffic.
dlb237 commented 1 month ago

On 25 Jul, 2024, at 8:28 pm, Greg White @.***> wrote:

Since there is no priority elevation or reserved bandwidth, but only a shallow-buffered queue and otherwise Default treatment, there isn’t an incentive for applications that need a deep buffer to try to get anything “special” by picking the wrong codepoint.

But that is not true in all existing implementations of NQB - notably the DualQ also used for L4S. Which is what Sebastian has been talking about in this thread.

dlb237 commented 1 month ago

Precisely because of that property. Since there is no priority elevation or reserved bandwidth, but only a shallow-buffered queue and otherwise Default treatment, there isn’t an incentive for applications that need a deep buffer to try to get anything “special” by picking the wrong codepoint.

Provided that the shallow-buffered queue is "configured as specified" ... where the draft still doesn't contain a solid specification of queue configuration.

Also, to be clear, the draft (particularly with the recent proposed changes*) talks about minimizing incentives , rather than making strong assertions about their absence.

I suppose that's an improvement overall, but for existing WiFi in the absence of traffic protection, it appears to be a distinction without a difference.

Thanks, --David

dlb237 commented 1 month ago

Hi Greg,

On 25. Jul 2024, at 19:28, Greg White @.***> wrote:

DB wrote:

If NQB is effectively an alternate name for Default, then why does the draft contain extensive discussion of and strong assertions about the absence of incentives to mismark Default QB traffic as NQB? Precisely because of that property. Since there is no priority elevation

[SM] Respectfully, "citation needed". I really want to see your system that effectively prioritises one class of traffic over another that is not elevating the priority of the prioritised class over the non-prioritised class. This is by the by, a question to the whole WG, what operational theory about how NQB delivers lower latency over QB do you have that make Greg's claim above true in your views?

or reserved bandwidth, but only a shallow-buffered queue and otherwise Default treatment, there isn’t an incentive for applications that need a deep buffer

[SM] But no application really needs a big buffer per se, but some applications might appreciate getting a larger capacity share... hence me harping on an attacker that paces its packets and is application limited to below the priority queue's capacity share.

to try to get anything “special” by picking the wrong codepoint.

[SM] But then NQB will do nothing... really you need priority scheduling of the shallow queue over the deep queue, otherwise the shallow queue will just produce more drops without any latency advantage. And it is the latency advantage that you are trying to sell here. As I explained in the past even the naive approach "giving the NQB queue 50% priority" as that sounds fair is flawed (I agree that this will make sharing between classes fair); as long as NQB traffic share is below 50% each NQB packet will get faster service than a QB packet, which results in NQB packets having priority over QB packets. And since the priority is priority is priority (there is no distinction between latency priority and throughput priority, that is not how packet scheduling works in a work conserving scheduler) NQB will require an attractive priority queue that invites abuse. Or NQB will not work at all, please pick one of the alternatives

Also, to be clear, the draft (particularly with the recent proposed changes*) talks about minimizing incentives , rather than making strong assertions about their absence.

[SM] This is IMHO still arguable, you can not deliver on NQB's promises without some sort of prioritisation. So what IMHO you need to do is put in effective policing against abuse, as the underlaying design is inviting abuse.

[SM] Respectfully, this is not a formal proof, this is more a set of talking points.

Sebastian

-Greg

gwhiteCL commented 1 month ago

The conclusion of the discussion at TSVWG on July 26 was that this issue has 3 sub-topics, described by @dlb237 as:

  1. incentives
  2. security
  3. wi-fi

@dlb237 commented that we probably won't come up with a perfect solution to the wi-fi issue, and we should focus on getting the other two resolved.

I've created two new issues #50 and #51 to work these two issues.

gwhiteCL commented 1 month ago

closing this thread. refer to #50 and #51