httpwg / http-extensions

HTTP Extensions in progress
https://httpwg.org/http-extensions/
447 stars 147 forks source link

Priorities: new parameters #1733

Closed kixelated closed 3 years ago

kixelated commented 3 years ago

One of the goals of this document is to define an extensible way of adding new priority parameters, including a registry. I've started on a draft that would define a new priority parameter order.

However, I'm running into some issues when order and urgency are used together. This seem inevitable because the server does not advertise it's capabilities, so my application is forced to send both if it wants some form of prioritization.

There's a few options:

  1. Define a backwards compatible way of supporting urgency alongside order. This is very difficult because they can cause wildly different behavior.
  2. Ignore urgency when order used. This would mean that new priority parameters would effectively deprecate older parameters.
  3. Advertise server capabilities out-of-band and only send one parameter.
LPardue commented 3 years ago

Noted, this is a little thorny.

https://httpwg.org/http-extensions/draft-ietf-httpbis-priority.html#name-defining-new-parameters documents considerations around defining new parameters. It suggests that

new parameters should not change the interpretation of or modify the predefined parameters in a way that is not backwards compatible or fallback safe

There's no requirement stopping that. But the further away the elastic band stretches, the harsher the pain if it snaps back. So option 2 might be ok or it might not be - we'd probably need to see a spec to understand some more.

Advertisement of HTTP capabilities is a tricky area. Maybe an HTTP/2 or HTTP/3 SETTING would suit you but it depends on the use case because those can introduce delays (notwithstanding the ALPS TLS extension proposal).

Option 1 might be doable with sufficient care or compromise, hard to tell. It depends how bad the application becomes when the compatibility or fallback kicks in.

Is there anything that you're raising specifically on this draft. Is there anything that we can action?

kixelated commented 3 years ago

Yeah I don't really have any good recommendations. I like the simplicity of a Priority header, but I think it's going to become a mess as new strategies are added.

My application does need to know if prioritization is supported before it can issues requests in parallel. I'm thinking some form of hop-by-hop capabilities exchange (option 3) accomplishes that while also addressing this issue, but it's just so much extra complexity.

I think the right action is to make these initial priority parameters powerful enough to cover most use-cases, removing the need to establish new parameters. I know that goes against the spirit of the draft.

kazuho commented 3 years ago

Yeah, ordering is about prioritizing requests relative to others, while Extensible Priorities focus on absolute values. This has been an deliberate choice; please refer to section 2.

While it might be possible to use Priority Parameters to signal ordering, as @kixelated points out, there has to be negotiation or the signal should only be sent using frames. We can discuss that as an extension, but I would say that that is out of scope of this draft.

LPardue commented 3 years ago

I think the right action is to make these initial priority parameters powerful enough to cover most use-cases, removing the need to establish new parameters. I know that goes against the spirit of the draft.

The feedback we have had on list and on back channels is that the scheme does well in a number of HTTP use cases that have been deployed across different vendors.

The one use case that has been reported (by Apple and yourself) to have problems is low-latency media streaming. My understanding over lengthy discussions is that the use case requires strong assurances of behaviour:

I understand why this is as it is, these use cases is real. But the specifics of behaviour on the micro scale and the challenges of deployment on the macro scale are such that I think HTTP-based live streaming is a very special class of it's own problem. IMHO step 1 of a solution to that unique problem starts with an entirely different philosophy of prioritization, that servers (and maybe clients?) have to perform exactly as instructed within some margin of tolerance. That's an interesting piece of work that might go somewhere but I'm dubious it would widely succeed.

kixelated commented 3 years ago

My apologies for not being on the list sooner. I've had the email open to subscribe multiple times now but never actually hit send.

I've thought about prioritization a lot, although probably not as much as you guys. The CloudFlare blog has such a great write-up even if HTTP/2 prioritization did not catch on.

The success of a prioritization scheme depends on the user experience during congestion. For a web browser, there's a bunch of little milestones, such as being able to display text, apply stylesheets, run scripts, load images, etc. The sooner you can get to these milestones, the better the user experience. This means spending all available bandwidth on the first milestone, spending the remaining bandwidth on the second milestone, spending the remaining bandwidth on the third milestone, etc.

I haven't worked on a browser, but I don't quite see how urgency accomplishes this without stricter guarantees. If early milestone resources are not being prioritized, and on each hop, then requesting later milestone resources in parallel may hurt the user experience.

For example, simultaneously requesting images with u=3 and scripts with u=5. If there's congestion on any hop and that sender does not send u=5 before u=3 (for whatever reason), then it may be a better user experience to request them sequentially instead.

If I already know that I'm requesting a resource from CloudFlare or Fastly and they will prioritize in a particular manner, then there's no problem. My concern is that a generic video player or browser doesn't know this information ahead of time.

I don't think that HTTP live streaming is a fundamentally different problem. You can think of each HLS segment as a "milestone", and indeed it's currently better to request these segments sequentially than have them fight for bandwidth in parallel. In fact I think HTTP live streaming is the ideal prioritization use-case because resources are massive, data transfer is constant, congestion is inevitable, and the user experience is terrible if a resource does not arrive in time (buffering).

LPardue commented 3 years ago

No need to apologise, we all suffer from too many list syndrome :-)

The success of a prioritization scheme depends on the user experience during congestion.

If the use of the term congestion here is implying transport only aspects I disagree. It's entirely possible to have a bad user experience with a perfect signalling scheme, that a server is known to implement exactly as defined because, for example, the seek time to a file on a hard drive is slow, or there is a long RTT to a the server, or the DNS resolution for the origin that the resource is on is failing, or etc etc. All of those things can happen when there is no transport layer congestion. Perhaps you mean availability of data to serve?

For a web browser, there's a bunch of little milestones, such as being able to display text, apply stylesheets, run scripts, load images, etc. The sooner you can get to these milestones, the better the user experience. This means spending all available bandwidth on the first milestone, spending the remaining bandwidth on the second milestone, spending the remaining bandwidth on the third milestone, etc.

That depends on the precise makeup of a webpage. Some resource need to be served in full before they are useful. Some resources can be processed incrementally. There's a lot of nuances here. The challenge, that you correctly point out, is to find the most appropriate use of available bandwidth to multiplex response and to deliver the critical parts of resources with minimal delay from the ideal edge of their availability. Where resources are not available to be served against the ideal edge, then a server has to make a judgement on whether to commit bandwidth to some resources. That bandwidth cannot be easily reclaimed if there is a high BDP or large buffers between the server and client.

We should also not make the assumption that all sections of a resource are equally important. For instance, with progressive images the early bytes are more important than the later bytes . This is explored more in https://blog.cloudflare.com/parallel-streaming-of-progressive-images/, where we use a response header to inform the intermediary of precise byte boundaries only the image server knows. One might argue a client could reprioritize after the byte boundaries are crossed, but the reality is that responsiveness across RTTs is probably too slow when the bandwidths these days are so large. Fine margins here. The cf-priority-change header was designed before Priority but the design could be ported over as a new priority parameter. This demonstrates much more flexibility and control than is possible with RFC 7540 stream priorities.

I haven't worked on a browser, but I don't quite see how urgency accomplishes this without stricter guarantees. If early milestone resources are not being prioritized, and on each hop, then requesting later milestone resources in parallel may hurt the user experience.

For example, simultaneously requesting images with u=3 and scripts with u=5. If there's congestion on any hop and that sender does not send u=5 before u=3 (for whatever reason), then it may be a better user experience to request them sequentially instead.

A server is always making prioritization decisions whether there is a priority signal (i.e the priority header) or not.

In your example, the client is preferring non-incremental responses for images, over non-incremental response for scripts. It is recommended that non-incremental responses are served in the order they were requested. So if a client issues on HTTP/3

stream 0: /image1.png stream 4: /script1.js stream 8: /image2.png stream 12: /script2.js

If all those resources are available to a server that follows the spec guidance, the order of response data would be: image1.png, image2.png, script1.js, script2.js.

If image2.png was not readily available, the order of response data would be: image1.png, script1.js, script2.js ... sometime later image2.png.

If image2.png is slightly delayed, a server could prempt things. Such as image1.png, half of script1.js, image2.png, half of script1.js, script2.js

If some or all of those requests were made with an incremental flag. The server would make different choices.

If I already know that I'm requesting a resource from CloudFlare or Fastly and they will prioritize in a particular manner, then there's no problem. My concern is that a generic video player or browser doesn't know this information ahead of time.

I don't follow. The web doesn't operate in a vacuum. Players or browser operate on behalf of a user, who is accessing an application or service. That is likely to have been designed by somebody with an interest in the application and its behavior. That somebody should know where their resources are being served from. Speaking from experience of a video player service that operated a multi-CDN solution, the use of DNS names is a simple and powerful method to know where responses are coming from and the types of implementations that are powering them.

The worst thing that can happen if a server doesn't prioritize effectively is that application measures of success will score badly. Users may notice, or they may not care. Operators may notice, or they may not care. People who do care may notice and make a page that names and shames bad performing implementations - https://github.com/andydavies/http2-prioritization-issues. You can see on that test case a depressingly large number of cloud hosting services did not do very well and, to my knowledge, few ever remediated the issues despite customers filing complaints or feature requests. I hate to pick on anyone but Amazon CloudFront is a major example of this problem - what would it take to make them change? I don't think a nigh-impossibl-to-achieve requirement in a specification is going to shift the needle.

I don't think that HTTP live streaming is a fundamentally different problem. You can think of each HLS segment as a "milestone", and indeed it's currently better to request these segments sequentially than have them fight for bandwidth in parallel. In fact I think HTTP live streaming is the ideal prioritization use-case because resources are massive, data transfer is constant, congestion is inevitable, and the user experience is terrible if a resource does not arrive in time (buffering).

I think this depends on the latency requirements of the stream. Speaking from experience, traditional HLS or DASH players build up a multi-segment (multi-second) buffer in order to avoid buffer underflow and playback disruption. What that means is that clients will get into a pattern of periodic traffic bursts when fetching the next segment to buffer and then have periods cessation. The bandwidth available to clients is typically massive in comparison to the size of segment being downloaded.

Stream start and seek times are important and can be different to above. There you might need to focus on quick delivery of the thing that needs to be played over something else. That's where an inflight request for something you no longer want can steal bandwidth from the thing you do. And since this is an unpredictable user action, the only recourse is stream cancellation or reprioritization. And then you get back into the realm of delayed response, RTTs and BDP.

Traditionally, segment sizes have been so large (in the time domain) that it's easy for a client to manage the ordering of things. Yes you could in HTTP/2, for instance with a VOD stream, just request all of the segments at once and punt the problem of ordering to the server. But then that's not going to be very ABR-like. So client's should only have a few requests in flight to not over commit their ABR algorithm. In something like DASH where audio and video segments are not multiplexed at the encoder, the client is tasked with muxing them at synchronization points. Which audio or video segment comes first, or how they compete for bandwidth is meaningless because they both need to arrive in full in order to make a meaningful presentation of them. If a client is only issuing a few requests in-flight at a time, the problem of order is trivial. You don't need any additional signal, just look at the order.

Low latency ABR video reduces the length of segments (or may even expose sub-segment primitives). This massively reduces the size of buffers and the margin for error because RTT starts to approach segment length. The needs are a bit different here. 2 years ago, large parts of the industry were told that the only way to satisfy the needs of low latency video stream was to run HTTP/2 server push through the entirety of a video distribution chain, see https://mux.com/blog/the-community-gave-us-low-latency-live-streaming-then-apple-took-it-away/. And folks soon realized that placing such requirements on implementations isn't very practical, leading to it being replaced by something that was more amenable https://mux.com/blog/low-latency-hls-part-2/. "Blocking requests" can be made before a segment is available and servers won't reject them, instead they'll hold them and serve them when the resource is available). This helps eliminate RTTs that might cause interruptions. Blocking requests can be made in batches and here the ordering of serving is important, requests from the start of the batch through to the end of the batch. I'd argue that this is a no brainer for a server to handle without any additional priority signal. But if we want one, it is easily supported by sending all requests with an extensible priority of u=3 - a server that follows the guidance in the spec will serve these sequentially in the order they were issued.

From the other issue it sounded like you wanted something different, where servers employ a reverse order. You could achieve that by an implicit-oriented boolean parameter like reverse-order=?1, rather than trying to explicitly order each and every request. I could be misunderstanding though.

kixelated commented 3 years ago

A server is always making prioritization decisions whether there is a priority signal (i.e the priority header) or not.

Ah okay, I think this is the source of most of our disagreement. I've been under the assumption that this client header is the only prioritization parameter. But I understand from your vantage point, CDN customers also use their own business logic to prioritize on the server.

I would like to define a video protocol, and as you pointed out with LL-HLS, it needs to be simple enough to adopt. The protocol depends on some way of defining a priority per segment, either a query parameter or a header or perhaps some server-side logic.

In my case, I don't think it's feasible to ask providers to use and configure a CDN such that it prioritizes the video content according to specific rules. Imagine if LL-HLS was only supported by a handful of CDNs; the community would lash out even harder than the HTTP/2 requirement.

The prioritization logic I need is very simple; deliver higher priority response first when possible. This draft is sooooo close to being able to do that with a standardized header but it just barely misses the mark.

LPardue commented 3 years ago

Ah okay, I think this is the source of most of our disagreement. I've been under the assumption that this client header is the only prioritization parameter. But I understand from your vantage point, CDN customers also use their own business logic to prioritize on the server.

That's not what I meant.

When any actor or system is asked to multitask, it has to make a decision about how to assign resources to those task. That is prioritization.

A system has many strategies for prioritizing and some of those require no additional input. It could roll a dice and pick tasks. It could time slice them evenly, it could run each one until completion, etc.

Additional input can possibly be used to tune that strategy. For example, a scheduler might look at the length a job takes, the type of work a job takes, whether work has dependencies that block other jobs, etc.

For an HTTP server, (ignoring application signals of resource priorites for the moment) it has to multitask different client connections, different requests on those connections, different I/O read access and different compute tasks. It has to make a decision about how to serve things. The only other option is to disconnect itself and power off. Since HTTP has many features, servers will have an appreciate of the types of content being served, the length of content, whether content needs additional processing (such as on-the-fly compression) and so on. Then along comes a new piece of information - a client priority signal. This is new information is a sea of existing information. A server might factor that into its scheduling strategy, but its weighting against other signals could be high or low. This is ultimately an implementation decision - there's not special sauce or unfair advantage - it's just engineering tradeoffs. With RFC 7450 stream priorities, what we observed is that there were hardly any APIs on the client or server side that anybody do anything than the default strategies implemented inside software. To use a car analogy, if I own a Nissan Leaf and want it to perform as well as a Tesla, I can sit and hope for Nissan to do something or take my business to Telsa. Rebuilding the Leaf into a Tesla myself is impractical.

I would like to define a video protocol, and as you pointed out with LL-HLS, it needs to be simple enough to adopt. The protocol depends on some way of defining a priority per segment, either a query parameter or a header or perhaps some server-side logic.

In my case, I don't think it's feasible to ask providers to use and configure a CDN such that it prioritizes the video content according to specific rules. Imagine if LL-HLS was only supported by a handful of CDNs; the community would lash out even harder than the HTTP/2 requirement.

Then you might be surprised to hear that CDNs quite often already give special treatment to media traffic compared to other HTTP traffic. That can range from different caching policies, to different congestion control parameters, to different billing properties. Its a free market and if there are economic incentives to implement things that are feasible for engineering to implement and providers to operate, then things get done. Where those things are developed as standards, customers and vendors both stand benefit. The LL-HLS backlash was in part caused by a lack of engagement with the industry while designing a solution. That solution was then presented as pretty much complete and did not align with the practicalities of HTTP distribution architectures, the engineering work to realize that design was difficult, costly and/or time consuming. In reality, an alternative design that achieved most of the same goals with much less implementation difficulty was found. I dare say that had the vendors been involved in an earlier stage of the design, the whole server push episode would have been avoided.

The moral of the story is: if you want to design a system that is intended to be run across the Internet, public clouds, CDNs and user agents it pays to consult those folks in the early stages of design. They all have different needs, capabilities and interests that might challenge assumptions of a design developed in a vaccum. Bringing work to standards bodies is one element of that, but not the only one.

The prioritization logic I need is very simple; deliver higher priority response first when possible. This draft is sooooo close to being able to do that with a standardized header but it just barely misses the mark.

This is what urgency provides, urgency=0 requests are recommended to be served before urgency=7 requests. When requests have the same urgency, the recommendation is to serve them in the order that they were requested.

If you have different needs to that, then the case is not simple. Please explain it more.

LPardue commented 3 years ago

A different idea for the described use case (as I understand it) is that you'd like some form of urgency decay.

More generally, that's some priority change over some schedule dimension such as time or bytes. In such a model, the urgency parameter communicates the initial hint of the sender, which is only valid for some period.

For media streaming, you're operating to a timeline, so it makes sense to say something like "this segment is important until point X, and then its not". Image now a client that requests all video segments with u=3, and a server which responds with u-change-schedule=after-seconds;1;6 to inform the intermediary that it should after 1 seconds drop the absolute urgency of the segment to 6. When there is only one item in the intermediary's output queue, the urgency drop makes no difference. But it there are multiple items, then new requests will naturally be more urgent.

This design avoids strict dependencies between requests, which are hard for distributed systems to reason about. It achieves adaptive response prioritization without the client having to pay RTTs, which means it could perform better. A client could provide a change schedule itself, or the server, or some local application code. This demonstrates the benefit of not making any one-side's signals completely authoritative on the prioritized bytes-on-wire.

As @kazuho noted, our choice to avoid dependencies was purposeful and I beleive it makes the reasoning about extension such as I proposed more easy. I'm always happy to discuss extension possibilities in the HTTP WG but I am not seeing action to take on this draft right now.

kixelated commented 3 years ago

I filed another issue because I think we've drifted a bit off-topic from the original issue surrounding implementing new parameters.

Urgency decay could work, although relies on being able to estimate when the next request will be made. That's not the case for video (variable GOP sizes) and it's rather fickle to rely on timings anyway. It's much simpler to say "new request is more important" rather than "old request will become less important after x seconds".

I do keep getting a better picture of the problem you guys are trying to solve versus I'm trying to solve. For example, I didn't think about how each CDN already has it's own bespoke prioritization scheme, and that's why you don't want to trust the client parameter blindly, but rather use it as a hint.

I've been in the mindset that "my application knows best" and I don't want the server to make any decisions for me. Both the client and server have access to additional information, although I argue that the client knows the real importance of a request in the scope of the overall algorithm, and that's far more important than anything the server knows.

LPardue commented 3 years ago

Please stop using the term CDN. What we're talking about is servers or intermediaries as defined in HTTP semantics.

I've been in the mindset that "my application knows best" and I don't want the server to make any decisions for me. Both the client and server have access to additional information, although I argue that the client knows the real importance of a request in the scope of the overall algorithm, and that's far more important than anything the server knows.

The fundamental problem with this viewpoint is, the client is completely uninformed about what is happening on the sending side. It will not know that the server just experienced packet loss and is retransmitting packets. It will not know the server is actually reading stuff from an offsite backing store that just decided to go offline. It will not know that the server is experiencing a DoS attack and is trickling out responses in order to reduce load.

Since we know that implementations struggle to realize the possibilities given to them by a very precise dependency signalling model. And that the signalling model was problematic to implement over QUIC. We have designed something that is pragmatic and covers a lot of what people already could do. If the application really has very specific needs, then HTTP as we know it might not be the best fit it.

LPardue commented 3 years ago

Since I'm not seeing any action to take on the draft, I'm going to close the issue. But if there's something I'm overlooking that can be actioned, we can reopen.