ietf-wg-scone / scone

Merged Internet-Draft based on the TRAIN and SCONE proposals
Other
1 stars 3 forks source link

Client-initiated signal #5

Open smishra1200 opened 2 months ago

smishra1200 commented 2 months ago
  1. Section 2 of the draft shows "QUIC Sender", ie, the server initiates the TRONE signal toward the client. This implies that TRONE signal is going to be server initiated. This will result in having the "network element" listens to all IP flows in the downlink direction. This is CPU intensive
  2. The authors should consider a design that is independent of SNI-based flow detection
  3. The authors should reconsider this design and allow for CLIENT initiated TRONE signaling (uplink direction) with implicit support for flow detection

As a point of reference, section 6 of draft mishra-scone-usercase-00, added following requirement: SCONE (aka TRONE) signal MUST be a client-application endpoint initiated to assist the network element (UPF/5G or PGW/4G) with the implicit flow detection please see https://www.ietf.org/archive/id/draft-mishra-scone-usecase-00.html#section-6-1.1.1

ihlar commented 2 months ago

TRONE packets are sent by each endpoint individually, so there is opportunity for the client to send an early TRONE packet such that a network element that keeps some flow state can use the presence of a TRONE packet as an indication to not do further DPI and SNI parsing etc. The problem in the current draft version is that endpoints need to indicate support using transport parameters before sending TRONE packets. The authors have discussed an approach where an opportunistic TRONE "indication" can be appended to a QUIC initial packet. The network element can use this information as an indication that the client is willing to receive rate signals. As opposed to "regular" TRONE packets, this indication would have to be appended to the end of a QUIC Initial packet.

kazuho commented 1 month ago

We have a lot of good discussion going on in #23, but maybe an issue is better to explain how people view the problem. So here are my thoughts:

As I understand, there are two types of flows with different characteristics:

  1. flows rate-limited using CC
  2. flows for which short bursts are allowed, with the assumption that the long-term bitrate is below the policy

Because SCONE signals are purely advisory that endpoints often cannot adhere to (see #29), network elements have to monitor the long-term bitrate of type 2 connections, and if they are exceeding the limit, demote them to type 1. This would be the ordinary business for any SCONE-compatible network elements with the capability to enforce bitrates.

If network elements want to demote flows without paying the cost of monitoring the long-term bitrate, they can consult the QUIC Version field of the first few packets sent by the server. As stated, there will be false positives; hence this is an optimization.

The question regarding client-driven indication is, IMO, if we want to have another knob for optimization, providing the capability to demote flows just by looking at the first packet from the client. Note we would have even more false positives than the first knob, because clients might be sending Indications to servers that do not support SCONE.

While I’m not necessarily opposed to having Indications, I am still wondering why some think it is insufficient to only have the first optimization knob (i.e., see if the server sends SCONE packets early).

The heavyweight task that the network elements cannot avoid is actually measuring the long-term flow rate and demoting the connections. Compared to that the benefit for having the 2nd knob (Indications) seems marginal if any, even though it makes things much more complex.

smishra1200 commented 1 month ago

Hi @kazuho

Thanks for your thoughts on this. Following up on the discussion, here's my perspective:

From a network element standpoint that uses deep packet inspection (DPI) for rate limiting, the process involves identifying potential video sessions and then applying rate limits based on the subscriber's data plan (e.g., 2 Mbps, 4 Mbps). This results in traffic being passed to the radio network at the specified bit rate, and any excess traffic arriving at a faster rate is discarded.

Considering this within the SCONE working group, it seems beneficial if a client application on the user equipment (UE) initiating adaptive bitrate (ABR) video playback could signal to the rate-limiting network element: "I can send packets at a rate you can handle, so you don't need to drop them." This uplink indication would provide the network element with information to consider.

Upon receiving such a signal, the network element might forgo DPI and potentially wait for a corresponding signal on the downlink for the same 4-tuple (within a reasonable timeout period) before applying the allowable rate limit. While inspecting downlink traffic can be resource-intensive, minimizing this inspection could be a desirable optimization.

Conversely, without this client-side notification, a network element relying on DPI will likely start enforcing rate limits immediately at the beginning of a video session, as it has no initial way of knowing if the client application is SCONE-capable.

For SCONE deployment, this early signaling mechanism distinguishes SCONE flows from non-SCONE flows. It's also important to note that SCONE usage is separate from the congestion control mechanisms within the mobile packet core and the RAN, and therefore outside the scope of our SCONE discussions.

Thanks, Sanjay

martinthomson commented 1 month ago

From a network element standpoint that uses deep packet inspection (DPI) for rate limiting, the process involves identifying potential video sessions and then applying rate limits based on the subscriber's data plan (e.g., 2 Mbps, 4 Mbps). This results in traffic being passed to the radio network at the specified bit rate, and any excess traffic arriving at a faster rate is discarded.

Obviously, if your condition is that you do DPI, then you do DPI. But if your goal is to ensure that video does not exceed a given limit, is DPI really necessary?

My understanding is that the main concern is high network consumption over long periods of time, something that video is uniquely "good" at. Right now, you use DPI to ensure that you do not limit non-video flows. But in a world where you have the potential for SCONE to be used on those flows that you are interested in, isn't there a different approach?

At the start of a flow, you do not limit it, but set a timer. If that timer pops without seeing SCONE, you switch the rate limit on with a relatively small window. However, if you see a SCONE signal before then, you switch the timer on with a looser timer. The rate limit would be the same in both cases, but the SCONE flow would have a deeper token bucket, so would be able to perform better. There's no DPI involved. The cost being that video flows will potentially observe huge throughput for the path and might get overenthusiastic in their choice of quality, which can degrade the overall experience.

Now, with an indication, you are in much the same situation. A signal might cause the DPI to be skipped, but then you still have two eventual end states: one where the signal never manifests and one where you see a SCONE signal. I can't imagine that the outcome is any different.

I say that because when you say:

Upon receiving such a signal, the network element might forgo DPI and potentially wait for a corresponding signal on the downlink for the same 4-tuple (within a reasonable timeout period) before applying the allowable rate limit.

That implies that you are exposing yourself to flows opting out of DPI. And you can't go back in time and do DPI for that flow if you do that. So if the signal never arrives -- something that seems likely -- you have lost the opportunity to "classify" the flow according to your DPI-based logic.

For me, the only thing that is potentially relevant here is this transitionary period where only some clients and servers support SCONE. There, having an indication might be useful, but I don't see the game theory of this working out. I can't see a network that currently uses DPI for classification giving that up; so unless you want to promise that on behalf of your employer and find other network operators willing to make similar commitments, this all seems pretty speculative to me.

ihlar commented 1 month ago

For me, the only thing that is potentially relevant here is this transitionary period where only some clients and servers support SCONE. There, having an indication might be useful, but I don't see the game theory of this working out. I can't see a network that currently uses DPI for classification giving that up; so unless you want to promise that on behalf of your employer and find other network operators willing to make similar commitments, this all seems pretty speculative to me.

I am not in a position to make any promises or name individual companies for that matter. But what I can say is that, in several conversations with CSPs, the potential for reducing the load caused by DPI in the user plane functions is one of the most attractive aspects of SCONE-like solutions. Initially, they would not disable DPI; rather, they would introduce SCONE into their DPI signatures. Packet filters matching "the SCONE application" would have higher priority than the more complex ones targeting individual applications based on SNI and similar. This approach allows a gradual shift to SCONE-like policies (longer enforcement windows) for traffic that adopts SCONE early, while retaining existing throttling policies (shorter enforcement windows) for other traffic.

This, of course, only matters if you're not using your DPI for other purposes as well. However, there aren't many use cases beyond this type of policy enforcement where DPI must run in real-time on all flows (unless you're in the business of censorship or similar). Often, DPI is used primarily for collecting statistics or business intelligence; in these scenarios, performing DPI on a statistically significant subset of flows is possibly sufficient.

At the start of a flow, you do not limit it, but set a timer. If that timer pops without seeing SCONE, you switch the rate limit on with a relatively small window. However, if you see a SCONE signal before then, you switch the timer on with a looser timer. The rate limit would be the same in both cases, but the SCONE flow would have a deeper token bucket, so would be able to perform better. There's no DPI involved. The cost being that video flows will potentially observe huge throughput for the path and might get overenthusiastic in their choice of quality, which can degrade the overall experience.

I would hope that we end up with something like this. My worry is that the leap from the current situation to something like this approach, without an intermediate step, will be too significant for many CSPs, since it involves rethinking policy enforcement to some extent, which is more than a technical problem. The options would then be to either keep relying on DPI for classification, which reduces the incentive to deploy SCONE, or to apply some phased approach e.g., as described above, and gradually assess the benefits.

Allowing for, optional, early SCONE indications is a way to help adoption of the technology, and is imo a small price to pay in terms of both protocol and specification complexity.

huitema commented 1 month ago

Color me just as skeptical as Martin. Implementations will send SCONE packets if they expect to get an advantage. We have to first explain what that advantage is. "Getting rid of DPI" is an advantage for the network. Application developers are not going to care much about that -- if they want to get rid of DPI, they will simply use ECH to scramble the data in the "initial" packets of a QUIC connection, hiding the data that DPI could find today.

And sure, this is an arms race. If application developers better encrypt the data, the middle boxes will probably deploy some fingerprinting technique based on size and timing of the encrypted packets. This will probably succeed in identifying flows, but it requires observing enough packets to find the pattern. If the decision can wait that long, networks could just as well apply the "delayed decision" strategy that Martin is describing.

mjoras commented 1 month ago

Color me just as skeptical as Martin. Implementations will send SCONE packets if they expect to get an advantage. We have to first explain what that advantage is. "Getting rid of DPI" is an advantage for the network. Application developers are not going to care much about that -- if they want to get rid of DPI, they will simply use ECH to scramble the data in the "initial" packets of a QUIC connection, hiding the data that DPI could find today.

@huitema Application developers are not the only ones needed to enable SCONE adoption. There has to be an incentive for the network as well. SCONE, especially with an early indication, is one way we realistically see a way to avoid some of this "arms race" you describe.

You may be skeptical but we have CSPs and those that vendor equipment to CSPs saying that this would meaningfully help with SCONE deployment. What more would it take to make you not skeptical?

huitema commented 1 month ago

You may be skeptical but we have CSPs and those that vendor equipment to CSPs saying that this would meaningfully help with SCONE deployment. What more would it take to make you not skeptical?

I don't like to deal in cliches, but you want a "win win" strategy. There has to be something gained by the application, otherwise it will just not bother. Plausible gains:

1) Allow application to implement some long term tuning to network capacity (by opposition to just follow short term congestion feedback). This should result in lower latency because the application could forgo "capacity probing" strategies that build queues, and possibly lower power consumption (don't bother firing up a hi-def codec if you know that will not work).

2) Make the traffic more "regular" -- the natural consequence of application tuning. The shorter queues in the network improve its overall efficiency, which should be visible in measurement of latency and packet loss.

None of that actually requires "policing per flow" or special-casing flows that use SCONE. Networks will still need to implement forms of active queue management at the edges, and should really work to implement ECN for real time feedback. It is just that flows that actually keep within the envelope advertised by SCONE will experience many fewer congestion events, ECN/CE marks, or AQM induced latency.

mjoras commented 1 month ago

I don't like to deal in cliches, but you want a "win win" strategy. There has to be something gained by the application, otherwise it will just not bother.

@huitema as someone from an application perspective, I think you are somewhat over-complicating the win-win. There is quite a simple win-win that would happen today in many networks if we had SCONE. These are networks where:

  1. DPI is utilized per-flow to identify our application as video (even when it's not) and aggressively police it for a variety of reasons and in a variety of situations.
  2. The policing is heavy handed and quite difficult to deal with both from an application and transport layer perspective.
  3. Deploying ECH is not viable since ultimately "networks are going to do what networks are going to do" and the obvious thing to do is to be even more heavy handed since the "arms race" is extremely slow for them.

If we could wave a magic wand and have SCONE + the indication, there are networks where they would absolutely:

  1. Utilize SCONE to signal their advice.
  2. Disable 100% DPI for those flows since it has a significant cost for flow setup.
  3. Disable policing as long as the aggregate behavior of an application matches the signaled advice.
  4. Free us to deploy ECH on these flows.

3 In particular would also give them a competitive edge over their peers who still rely on the policing, giving an urgency for other CSPs to adopt the same practices.

We could achieve this by having SCONE + an indication and setting it on all flows. I don't know what else to say to assure you this isn't hypothetical. It's a win for applications and for the operators. Frankly the one I am least convinced by having a big "win" in this situation is a vendor but at the end of the day if their customers are asking for it, they will provide it.

kazuho commented 1 month ago

If I’m reading correctly, the argument for client‑driven Indications is to give CSPs a way to lighten their DPI burden. @ihlar’s comment nicely summarizes those benefits.

But aren't there risks as well?

Specifically, I have one question: what will prevent CSPs from simply mapping these new signals onto their existing DPI‑based video policy? Today, any flow marked as “video” typically receives a far lower throughput allowance than generic traffic—even when it’s bursty. If providers treat client-driven SCONE Indications as just another way to label traffic “video,” endpoints will effectively be barred from sending any non‑video data over SCONE.

As we discuss in #29, that outcome would sabotage browser adoption—browsers can’t predict in advance how a connection will be used, so they can’t safely opt in to SCONE if it means risking severe throttling. In the end, the only services that could realistically benefit from deploying SCONE would be those already classified as video. For everyone else, the incentive could turn out to be to avoid SCONE entirely, since advertising it would trigger the very low‑throughput “video” treatment.

Of course, CSPs remain free to throttle as they see fit—but there’s a world of difference between “they can throttle” and “we’re giving them a mechanism to reclassify potentially non‑video traffic as video just to throttle it.”

How do we prevent that second scenario? What mechanism ensures CSPs can’t just recycle their existing video‑throttling rules against SCONE‑enabled traffic?

ihlar commented 1 month ago

You make a very good point @kazuho. I think the response to this is that sending early indications should be optional and that endpoints such as browsers should be recommended to not use them. Or at least there should be sufficient guidance around both the risks and benefits of using early indications. For native video applications the benefits are significant, as discussed above.

ianswett commented 1 month ago

I wonder if we're trying to do two things with one mechanism and it might make sense to split them a bit?

For example, If we don't want people doing DPI, we can put the SNI in cleartext in a hypothetical MUFFIN packet in the same UDP packet as the ClientHello. To be clear, I am not at all sure this is a good idea, but at least there's no DPI and it aligns with my general principle of "If I wanted the network to know something, I'd tell it explicitly."

I'd also be fine having a packet type that says "Put the reverse path bandwidth in this" that we can send in the first flight if we're doing 0-RTT and with the ClientFinished if we're not. That would prevent it ever being sent in cases it hadn't been negotiated.

For throttling, I think the implementation is fairly easy: Keep doing what you're doing until you see a SCONE packet. My understanding is that most networks have some initial token-bucket model to ensure web pages load quickly, so they might let the first 100 packets through unthrottled even today.

As a person who writes congestion controllers and runs QUIC on servers, having the receiver and not the sender receive this signal is potentially problematic. It's fine for apps, but are we going to expose this value to browser clients so they can do ABR or communicate it to the server so it can do ABR?

Paths are commonly not symmetric, so I can appreciate the benefits of sending the SCONE packet in the relevant data path, but I do really want to have this information sender-side.

kazuho commented 1 month ago

@ihlar

I think the response to this is that sending early indications should be optional and that endpoints such as browsers should be recommended to not use them. Or at least there should be sufficient guidance around both the risks and benefits of using early indications. For native video applications the benefits are significant, as discussed above.

I’m not sure client‑driven Indications will even help native video apps. If you’re already rate‑limited, you’ve got nothing to lose by signaling early. But if you’re not rate‑limited today, you have every incentive to avoid SCONE—being tagged “video” means worse throughput and user experience.

Put another way, the real danger is that only the apps CSPs already classify as video will adopt SCONE (since their performance can’t get any worse), while everyone else steers clear. CSPs could then simply continue throttling based on those Indications, just as they do with DPI today.

That outcome would be a defeat: we’d have built a mechanism that helps CSPs maintain—or even strengthen—their rate‑limiting, by providing an explicit signal, precisely the opposite of what this WG is chartered to achieve.

mjoras commented 1 month ago

@kazuho I don't understand how any of this changes significantly with or without the indication. Without the indication, the exact same thing would happen, just later in the connection when an endpoint sends an actual SCONE packet. Indications change nothing about what information is visible to a network element, so how do they change the decision calculus? In fact, without ECH the exact same information is available at the same time (the client's ability to receive SCONE packets), by parsing the CH and looking for the SCONE transport parameter. It's just relatively costly to do so but since the existing practice requires decrypting the Initial packet anyway, it's perfectly viable to do.

An endpoint cannot control what the network does or does not decide based on SCONE or an indication or (as is common today) the contents of SNI or IP addresses. It is a similar problem to ECH, where taking away information can lead to the outcome of degrading all traffic.

We are falling into the trap of re-litigating the utility and benefit of SCONE in general, rather than focusing on the indication itself. Having an indication or not does not change the information available to a network element about a flow, it is a detail that changes how accessible and when it is accessible. Clients are free to utilize or not utilize SCONE as they please, and network elements are free to utilize it or not utilize it. The exact same would be true for an indication.

kazuho commented 1 month ago

@mjoras “Significance” is a subjective word.

Let me explain how I understand the incentives of different types of network elements, how they might use Indications, and what outcomes may result.

Consider three types of network elements capable of applying different policies to flows:

For type A elements, Indications could become a new tool to identify the type of flow and apply CC accordingly. This behavior wouldn’t negatively affect flows already throttled by SNI inspection. However, it would hinder SCONE adoption among applications that currently aren't throttled—being (mis)classified as video flows would lead to lower throughput.

For type B elements, Indications might be beneficial during SCONE’s early adoption phase. These elements can distinguish between potentially SCONE flows and definitely non-SCONE flows just by inspecting the first client packet. They could treat the small number of SCONE flows differently while continuing to apply DPI + CC-based throttling to the non-SCONE majority.

But as SCONE adoption grows and the number of SCONE flows approaches or exceeds the capacity of these network elements, what happens then? As far as I can tell, their only practical option is to revert to applying CC-based throttling to SCONE flows. That’s the only viable path forward.

For such elements, it's possible they would behave similarly even without Indications. While inspecting the first packet is easier, they can still analyze later packets if needed.

However, the key point is this: network elements that behave in this manner—like type A—would deter SCONE adoption at the endpoint, or cause users to disable SCONE, because using SCONE would result in degraded performance.

For type C elements, Indications are unnecessary. These elements can assume all flows are SCONE by default and only apply CC-based throttling if no SCONE packets are observed for a certain duration, or if long-term throughput exceeds a threshold.

In summary:

People may have different visions for SCONE, but in my view, the best-case scenario is:

From this perspective, misuse by type A elements is purely detrimental. Moreover, type A elements are the most commonly deployed today. If they begin applying CC-based limiting based on Indications, there would be little incentive for new applications to adopt SCONE. This, in turn, would remove any reason for future network elements to properly support SCONE, stalling its adoption altogether.

Type B elements are landmines best avoided, and we should refrain from providing incentives that would facilitate their development.

smishra1200 commented 1 month ago

@kazuho Scope of SCONE (if I understand it correctly) is for the UE to signal to the mobile network that it can "self-regulate its bit rate" provided the mobile network tells it, "what the 'self regulate bit-rate' can be". This is completely outside of the scope of how mobile networks manage congestion in the network (mobile packet core, the radio network and the access network). SCONE is meant for adaptive bit-rate applications such as video.

SCONE signal, if adopted by content publishers + mobile operators, would only apply to QUIC flows, so even if SCONE adoption grows, in its current charter, excludes apps that do not use QUIC, for example, TCP/IP flows.

Lastly, mobile networks do not regulate non-video flows, so any non-video application does not gain any advantage (or disadvantage) for wanting to use SCONE signal.

kazuho commented 4 weeks ago

@smishra1200

Scope of SCONE (if I understand it correctly) is for the UE to signal to the mobile network that it can "self-regulate its bit rate"

I’m not sure. As the draft is currently written, Section 3.5 notes that “the fact that an endpoint requests bitrate signals does not necessarily mean that it will adhere to them; in some cases, the endpoint cannot” (see also issue #29). In other words, SCONE does not promise self-regulation; it merely conveys rate-limit advice.

Moreover, even if SCONE packets did reliably indicate self-regulation, an Indication in the client's very first packet cannot be trusted to mean the same thing: few clients have out-of-band knowledge that the server supports SCONE.

Hence an Indication is, at best, a weak hint, with, by looking so similar to SNI, carrying the risk of being misused as a replacement for congestion-control-based throttling via existing mechanisms.

SCONE signal … would only apply to QUIC flows …

That is correct. Sorry if I have confused you by stating “all internet traffic becomes SCONE”; of course I meant “all internet traffic using QUIC.” With that clarification, the concerns above about reliability and potential misuse remain unchanged.

huitema commented 4 weeks ago

I agree with @kazuho. SCONE is defined as a way for the network to provide information to the host -- not the other way around. It only provide limited information about the host, i.e., that the host is capable to format and parse the SCONE packets.

Take the example of a network that really want to provide a different service to a subset of the connections. It will probably determine some kind of pacing rate based on the information in the UDP header, possibly augmented by inspection of the initial packet. The network element could document that rate in the SCONE packets, so the application can be parameterized accordingly.

But that DPI-and-pacing network is just an example. A home router may simply document the maximum data rate of the local connection, without any kind of DPI or per-connection state.

If SCONE is successful, yes, pretty much every QUIC connection will be using it, because they find the information provided by the network useful. That's the definition of success.

smishra1200 commented 4 weeks ago

@kazuho, agree on section 3.5 you point to above, my main point is, intent of SCONE is to not have the NE [capable of rate-limiting adaptive bit-rate video applications] throttle ABR video-traffic. Instead, send an advisory throughput signal to the endpoints video application. The expectation is that this change can result in better "end-user" experience and reduce packet retransmission. Of course, if the endpoint chooses to not use the throughput advisory signal then SCONE wont be applied to that QUIC 4-tuple.

So, the question is, does any indication to the NE here helpful? Does the indicator tells NE that there is a willing app interested in self regulating traffic and therefore look to send throughput advisory signal on both direction of end-points on the first opportunity.

smishra1200 commented 4 weeks ago

@huitema I'm trying to understand the incentive for networks to adopt SCONE. You mentioned that SCONE is designed for the network to provide information to the host, offering limited information about the host's capability to handle SCONE packets.

Given that networks already rate-limit ABR video applications without using SCONE, what would be the key reason for these networks to prefer SCONE as their tool of choice in the future? What specific advantages does SCONE offer to the networks to make the switch in this context?

kazuho commented 4 weeks ago

@smishra1200

So, the question is, does any indication to the NE here helpful?

My view is that the answer is no, assuming we measure success using the criteria stated concisely by @huitema, with which I fully agree. In a world where pretty much all QUIC connections use SCONE, there is little benefit—if any—in identifying at an early moment the few connections that do not.

Does the indicator tells NE that there is a willing app interested in self regulating traffic ...

SCONE, as currently drafted, does not provide such a signal. Privacy concerns aside, the practical problem is that endpoints often do not know how a connection will be used at establishment time. Video-only clients talking to a dedicated server might be able to assert that, but they represent only a fraction of all Internet traffic.

Given that networks already rate-limit ABR video applications without using SCONE, what would be the key reason for these networks to prefer SCONE as their tool of choice in the future? What specific advantages does SCONE offer to the networks to make the switch in this context?

For networks, the issues with the status quo is that they do not have the capability to rate-limit all long-term traffic, and that CC-based rate-limiting is sub-optimal.

Status quo = lose-lose:

SCONE = win-win:

martinthomson commented 4 weeks ago

I agree with @kazuho.

There's also a leap of faith involved. Networks might be trusting clients more when they disengage or loosen policers and shapers in favour of SCONE. Similarly, clients are trusting networks more when they engage rate limits in applications, rather than pushing usage up to the limits the network tolerates. If neither is willing to extend that trust, then this whole thing fails. But we're seeing that it can deliver that win-win outcome, so it's worth a shot.

mjoras commented 4 weeks ago

@kazuho I am trying to understand your different classification of network elements. The network elements that exist today which are relevant to the indication discussion basically do the following:

  1. Observe all new flows through them, with state being created per flow as they are initiated from the client direction.
  2. They try to classify these flows. They do this by applying what we are calling "DPI" but more concretely is being able to parse the entire packet.
  3. The classification as ABR video is often done by trying to find the SNI of known ABR video hostnames.
  4. If they classify as ABR video they apply what we're calling "CC throttling" or similar in this thread, by applying a token bucket queueing that either buffers or drops packets or some combination thereof.

The indication basically exists to make obviate the need for doing 2-4 while we work to make SCONE ubiquitous. This is the "trust" on the part of the network, and in exchange they get improved video performance for the humans and lower cost for the network elements per flow.

How exactly that "trust but verify" plays out will probably vary (is it active and online or is it aggregates and offline, etc.). But that's not what the indication is trying to solve.

I honestly think we are spending far too much time on hypotheticals here when the systems we are trying to change are extant. We are also getting side tracked debating the "philosophy of SCONE" which is largely irrelevant to whether we spend time trying to develop a reasonable indication optimization for the scenario described above.

kazuho commented 4 weeks ago

@mjoras

I am trying to understand your different classification of network elements….

Let’s focus on one type of network element for the moment.

The indication basically exists to make obviate the need for doing 2-4 while we work to make SCONE ubiquitous.

I think the word “while” might be the crux of the problem.

If I understand correctly, the premise behind Indication is that—during the initial deployment phase—operators might want to deploy network elements that can handle only a limited number of SCONE-enabled QUIC flows. If they could handle QUIC flows all with SCONE attached, there would be no need for Indications at all.

But what would these network elements start doing once SCONE does become ubiquitous?

The most pragmatic option for them will be to keep classifying flows by the presence of Indication and apply CC-based throttling, as they do not have the capacity to handle 100% SCONE flows.

In other words, the network elements meant to smooth the path for early deployment would end up backfiring once SCONE is widespread, turning Indication into nothing more than another “throttle flag.”

I don’t think we should encourage deployment of network elements that would undermine SCONE’s objectives the moment the protocol becomes a success.

Instead, we should focus on encouraging the development of network elements that can handle 100% SCONE flows.

ihlar commented 3 weeks ago

If they could handle QUIC flows all with SCONE attached, there would be no need for Indications at all.

This depends. An indication can be interpreted as an endpoint opting in to the strictest possible ceiling. Flows that do not include the indication might still benefit form SCONE signals, even if the communicated ceiling is less strict.

Instead, we should focus on encouraging the development of network elements that can handle 100% SCONE flows.

These network elements can work with 100% SCONE flows. As an example, a 5G UPF can apply multiple sets of policies at different scopes:

Often, the flow/application-based policy is more strict than the session AMBR. A flow that opts in to the strictest ceiling using an indication will get the lowest of the two policies. Flows that do not explicitly opt-in to the strictest ceiling will get the AMBR value.

It would be great if a fully SCONE capable world changes policy enforcement such that there is no need for differentiation on application type or use of indications (and I do understand that the indication and its meaning as a policy toggle is not pretty). But I think there is a long way to get that world, and if we design a solution for an idealized world we run the risk of getting little to no deployment at all.

Therefore, given the current reality, a strictly optional indication with clear guidance text that endpoints SHOULD only send it for flows they control (e.g., don't use this if you're a browser) and that networks MUST NOT penalize its absence, seems to me as a pragmatic way to increase SCONE deployment incentives.

And fwiw I'd be totally ok to define an optional MUFFIN packet as @ianswett suggests further up in the thread, but perhaps not copying the SNI, but simply being an indication of opt-in to the strictest ceiling regardless if it's enforced by SCONE or CC-based throttling.

zaheduzzaman commented 3 weeks ago

It is very hard to understand where we are at this issue? Do we have a decision here?

@smishra1200, as a network operator do you plan to give the throughput advice to any QUIC connection that goes through you UPF/NE? Or this is really based on subscriber policy meaning different subscriber get different advice or does not get anything at all?

smishra1200 commented 3 weeks ago

@zaheduzzaman

It is very hard to understand where we are at this issue? Do we have a decision here?

@smishra1200, as a network operator do you plan to give the throughput advice to any QUIC connection that goes through you UPF/NE?

No. Traffic policing is not universally applied across our network. Specifically, video policing is designed for video-playing applications utilizing adaptive bitrate video, and it does not extend to non-video traffic. Responding with a throughput advice to every QUIC connection would be impractical due to a couple of key reasons: first, within the context of SCONE, its application would be limited solely to ABR video; second, it would introduce an unwelcome and unpredictable processing load on the network

Or this is really based on subscriber policy meaning different subscriber get different advice or does not get anything at all?

Determining the appropriate video policy for a session does involve assessing both the subscriber's data plan and the radio access technology being used (4G/5G). A Throughput advice for SCONE will need to consider the subscriber's plan and the network they are connected to. For example, subscribers with a premium 5G plan will experience less strict video policing compared to those on other plans within the 5G network and on a 4G network.

huitema commented 3 weeks ago

Assume that we have defined a "scone capable" indication and that all connections are using it, whether or not they are sending videos. The connections will then send SCONE packets at interval. Do I understand correctly that:

smishra1200 commented 3 weeks ago

Assume that we have defined a "scone capable" indication and that all connections are using it, whether or not they are sending videos. The connections will then send SCONE packets at interval. Do I understand correctly that:

  • if your network has recognized the application as "doing video", the SCONE packet will be updated to indicate the target throughput, which will depend on the user's subscription.
  • otherwise, the SCONE packet will not be modified, and the throughput parameter will be left to a default value?

Use of SNI in combination with subscriber policy and RAT type guides the ABR video optimization. If the "scone capable" indication is applied only to ABR-video QUIC connections, the SCONE protocol would not only eliminate the need for SNI in flow detection but also provide a stronger rationale for its deployment.

However, applying the "scone capable" indication to all QUIC connections would mean our network would need to inspect every QUIC connection, regardless of whether it's a video flow. We would still need to rely on SNI to determine if a specific QUIC connection is a video flow to decide whether to populate the throughput parameter in the SCONE packet.

If my understanding is correct, it seems that implementing SCONE in this way would add to CPU processing, and the benefit compared to our current process is unclear to me. Additionally, SCONE's applicability is limited to QUIC, and TCP connections will still exist.

Could you please clarify if my understanding is correct?

ihlar commented 2 weeks ago

Trying to summarize the issue, based on the discussion here and in the recent interim meeting. There is broad agreement that giving networks an incentive to deploy SCONE is valuable. Early indications have been proposed as one such incentive, because they could reduce DPI load while SCONE adoption ramps up.

The main disagreement is what an indication should imply.

A suggested compromise is to allow indications but encourage all flows, video and non-video, to set them. That reduces the fingerprinting concern and makes the bit less attractive as a throttle toggle. It does, however, leave operators with a choice:

huitema commented 2 weeks ago

We may also mention that "Keep DPI to detect video and apply differentiated policies" fails in presence of ECH or migration.

smishra1200 commented 1 week ago

@ihlar, I understand you are only summarizing the conversation and greatly appreciate you doing this. I'd appreciate a bit more clarity on a few points in the summary:

Replace video-specific policies with generic “SCONE policies” where throughput advice is communicated, and long-term throughput is measured and potentially enforced on all SCONE flows.

Could you please clarify what constitutes a "SCONE" flow? Is it defined as "applications capable of adjusting their bit-rate based on network conditions," or is the definition different?

Indications are helpful during the transition when legacy and SCONE rules coexist.

It would be helpful if these indications were limited to "SCONE" flows. For example, a large file download that cannot adjust its bit-rate might not benefit from an indication, and the resulting throughput advice may not be useful since the download might not be subject to rate limits. (or any none ABR content)

Keep DPI to detect video and apply differentiated policies, possibly using SCONE to signal these policies.

I'm wondering about the value of SCONE over existing methods if it still relies on DPI. If SCONE's functionality is contingent on DPI, similar to the current approach, what would be the future role of SCONE, especially with the emergence of ECH?

martinthomson commented 1 week ago

SCONE is very much not dependent on DPI. The question at hand is whether networks would change how they condition their actions on DPI. That is, if they currently rely on DPI for policing/shaping, would SCONE change that?

We have some answers to that question and the answers are (unsurprisingly) not consistent.

smishra1200 commented 1 week ago

A mechanism that replaces SNI when used for video flow optimization is a change in the right direction. SCONE can give an alternative to networks over current methods in so far as identifying video flows.

afrind commented 1 week ago

Policies that shape or police an entire connection based on the presence of ABR video flows on that connection are counterproductive to QoE. We've found that coalescing video traffic with photos and dynamic requests generally gives us better control of response prioritization and corresponding QoE wins overall, except in networks that throttle, where non-video traffic experiences collateral damage.

We'd prefer that networks offer SCONE throughput advice (and remove shaping) regardless of the SNI, "video/non-video" or any other properties of the traffic. Tell us what bitrate is sustainable and we will deliver the best possible QoE within that limit.