moq-wg / moq-transport

draft-ietf-moq-transport
Other
82 stars 20 forks source link

Sender-side ABR #259

Open vasilvv opened 1 year ago

vasilvv commented 1 year ago

The basic problem is: the publisher provides two versions of the same track, let's say, video/360p and video/720p; the subscriber wants to receive only one of those, and it best quality it can receive in a timely manner. It makes more sense to make this switch at the sender, since the sender's congestion controller ultimately decides how much data the sender is going to send short-term.

@kixelated proposed before to allow the sender to subscribe to a list of tracks in order of preference, so something like SUBSCRIBE_REQUEST video/720p video/360p would result in 720p video being sent preferentially when available. I like this approach, though we do need to specify more details first. For instance, what are the conditions under which the sender is allowed to switch? I think those should be group boundaries (and we should also require alignment between tracks if you want to subscribe to those as an alternative).

wilaw commented 1 year ago

At June interim I presented this slide on a proposal for server-side ABR. Still like this approach. Requiring the server to have an estimate of the throughput is reasonable. We'd also need to figure out some convention to remove a track from a previously declared ABR group. Perhaps as simple as unsubscribe it and then resubscribe without a group number.

Screenshot 2023-09-29 at 5 02 53 PM

The catalog draft proposes the altGroup identifier, which would map conveniently to the group descriptor in the above image.

vasilvv commented 1 year ago

I think we should be able to atomically subscribe to the entire switching set at once, otherwise one of the subscriptions may have an error, leaving the subscriber in a half-subscribed state.

wilaw commented 1 year ago

That's certainly feasible with some type of subscribe object carrying name and an optional throughput threshold

SUBSCRIBE ({"n":"4k", "throughput":16000},{"n":"hd","throughput":8000},{"n":"sd"})

We already have other subscribe hints we want to hang off the subscription, so they could all be properties of a subscribe object

Although I would argue that even it you atomically subscribe, one of the subscriptions can still fail. Do you then fail the whole atomic group? I would think most players would want a notification of the failed track but then still continue with the tracks that are available. I like the flexibility for players to subscribe individually, convey ABR group as part of that subscription and get notified of failure individually.

kixelated commented 1 year ago

I implemented ranked delivery and it worked very well. The server would select a track based on the estimated bitrate.

SUBSCRIBE 0:1080p or 1:720p or 2:480p or 3:240p

However, the relay had access to the HLS playlist so it both knew the rendition bitrate and the switch points. I don't know the best way to support this within a generic MoQ relay and need to think more about it.

One of the problems with this approach is that it doesn't work between relays. If there's congestion between relays, then it should switch down for all downstream subscriptions too. This would be useful for more remote edges (ex. CDN edge on a cruise ship).

wilaw commented 1 year ago

I don't know the best way to support this within a generic MoQ relay

One solution is that relays never make 'ranked delivery' subscriptions upstream. They should always decouple them to individual subscriptions that are not gated on throughput. This also addresses the second concern of

If there's congestion between relays, then it should switch down for all downstream subscriptions too.

acbegen commented 1 year ago

It makes more sense to make this switch at the sender, since the sender's congestion controller ultimately decides how much data the sender is going to send short-term.

This does not make more sense to me. It is a better move to let the client know what the server thinks how fast it can send but still let the client decide to pick one of the options. The ABR selection is not just a function of the sender's sending rate, it is a function of client's receiving rate plus so many other things that only the client knows about.

hardie commented 1 year ago

On Sat, Sep 30, 2023 at 12:23 PM Ali C. Begen @.***> wrote:

It makes more sense to make this switch at the sender, since the sender's congestion controller ultimately decides how much data the sender is going to send short-term.

This does not make more sense to me. It is a better move to let the client know what the server thinks how fast it can send but still let the client decide to pick one of the options. The ABR selection is not just a function of the sender's sending rate, it is a function of client's receiving rate plus so many other things that only the client knows about.

I agree that the sender's information is incomplete; both sides have incomplete information. I tend to prefer letting the client decide both because the congestion/long bandwidth delay tends to be worse client side and because the client knows what its playout buffer size is. If it is caching a significant amount before starting playout (playing stored ads to get the buffer size up, for example), it may be willing to take a higher bitrate than a short-term measurement would indicate.

(Just a personal opinion, of course).

Ted

— Reply to this email directly, view it on GitHub https://github.com/moq-wg/moq-transport/issues/259#issuecomment-1741744367, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKVXZF5X66OQ4XWUXFN353X476LTANCNFSM6AAAAAA5MSB77E . You are receiving this because you are subscribed to this thread.Message ID: @.***>

suhasHere commented 1 year ago

+1 to Ted and Ali's points. Clients assess the recv rate and considering factors like (buffer depth, quality expectations and so on), can make a suitable choice on requesting the right quality stream from the sender. This is done in a loop and constantly updated to meet the application needs.

Such a design will also keep relays agnostic to rate control and media adaptation logic for media which is application and media type specific.

kixelated commented 1 year ago

Yeah, the fundamental problem is that the sender and receiver have an incomplete view of the world. Whatever the approach, we can't rely on QUIC's congestion control alone for real-time media.

If we look at WebRTC for inspiration, sender-side congestion control (GCC) is superior to receiver-side congestion control (REMB). I think you could implement something similar on the receiver with the right feedback, but that's a whole academic pursuit in itself.

Meanwhile, HLS/DASH uses a combination of TCP (sender-side) and ABR (receiver-side) for congestion control. From my experience, this is inadequate for real-time latency for a multitude of reasons and I can elaborate. This is roughly equivalent to what has been proposed by a few folks, having the receiver detect congestion and resubscribe, but it won't be good enough.

But I want to clarify that a goal is server-side ABR, which is not necessarily the same as sender-side ABR. The problem is dumb 3rd party clients out of your control. At Twitch this was especially a problem with the iOS HLS player, but it's also a problem with clients that are difficult to update, such as smart TVs and consoles. The ability to offload ABR onto the server gives a service more control and the ability to experiment.

Note that this also applies to broadcasting; the server doesn't want to rely on the stock client behavior. An analogy is RTMP, where the default OBS behavior was to increase the bitrate by 5% every minute after each congestion event, which is exceptionally slow. Fortunately OBS is a responsive open-source project, but we didn't have that luxury for other clients like the PlayStation broadcaster.

I think we need cooperative ABR. Both the sender and receiver need to share their view of the world somehow.

kixelated commented 1 year ago

Here's an situation I want to address:

A broadcast client is transmitting 3 renditions via simulcast: 240p 480p 1080p. A relay server is subscribed to all of them. Viewers use ABR to subscribe to one of the renditions from the relay.

If the broadcaster encounters congestion, then the relay will unsubscribe from 1080p or somehow deprioritize it. The problem is that downstream viewers will continue to fetch 1080p, despite it being starved, because the viewer's ABR algorithm did not detect last-mile congestion (via estimated network bitrate).

Okay so let's say the viewer looks at media timestamps instead of estimated bitrate. It could correctly detect that 1080p is starving due to first-mile congestion, so it switches down.

However this doesn't work if the broadcast client is not using simulcast, but is instead transmitting 1 rendition which is then transcoded into 3 renditions on the server. First mile congestion will equally affect all renditions, so switching down from 1080p to 240p will just make the picture quality worse for no reason. This was very common on Twitch and was one of the reasons why mobile broadcasting was poor.

kixelated commented 1 year ago

I think you need some form of chainable sender-side ABR to solve this. Here's a rough idea of what I'm thinking.


Each viewer only wants to subscribe to a single rendition, so it provides a list of acceptable tracks with OR.

viewer: Provides a list of all acceptable renditions in order of preference -> SUBSCRIBE: 480p OR 240p

relay: Responds with the initial rendition <- SUBSCRIBE_OK: 240p

relay: Responds when the sender-selected rendition changes <- SUBSCRIBE_OK: 480p

viewer: Can change the list at any point. For example: based on the playback buffer, the window size, the estimated bitrate, etc.
-> SUBSCRIBE: 1080p OR 480p OR 240p

relay: Can switch up now if it wants. It may also delay the switch until the cache is populated. <- SUBSCRIBE_OK: 1080p


The relay needs to serve multiple viewers, so it requests all acceptable tracks from the origin with AND.

relay: Subscribes to the origin in order of preference -> SUBSCRIBE 240p AND 480p AND 1080p

origin: Replies with the list of tracks <- SUBSCRIBE_OK 240p AND 480p AND 1080p

origin: indicates there's congestion and it's temporarily no longer sending all tracks <- SUBSCRIBE_OK 240p AND 480p


The relay can then use this information to modify the viewer's ABR rendition.

relay: Switches the viewer down to 480p <- SUBSCRIBE_OK: 480p


This is chainable through an arbitrary number of relays. There could be a lossy hop in the middle, for example GCP -> AWS, or Akamai -> CloudFlare, or satellite -> cruise ship. The ability to propagate ABR decisions seems very powerful.

acbegen commented 1 year ago

Yeah, the fundamental problem is that the sender and receiver have an incomplete view of the world. Whatever the approach, we can't rely on QUIC's congestion control alone for real-time media.

Again, QUIC's congestion control is only one of the signals - not just for real-time media but also for streaming.

If we look at WebRTC for inspiration, sender-side congestion control (GCC) is superior to receiver-side congestion control (REMB). I think you could implement something similar on the receiver with the right feedback, but that's a whole academic pursuit in itself.

There is tons of work in this domain. GCC is so old now, nobody cares about it anymore. There are many client-side algorithms out there, see for example: https://ieeexplore.ieee.org/document/9926128 (pdf is open access). Check Figure 2.

Microsoft will run a grand challenge on this for ACM MMSys'24.

Meanwhile, HLS/DASH uses a combination of TCP (sender-side) and ABR (receiver-side) for congestion control. From my experience, this is inadequate for real-time latency for a multitude of reasons and I can elaborate.

I respectfully disagree (see above).

This is roughly equivalent to what has been proposed by a few folks, having the receiver detect congestion and resubscribe, but it won't be good enough.

Again, HLS/DASH does that because they work over HTTP and till recently, it meant TCP. Since TCP does not expose anything, the entire responsibility for rate adaptation (which you call congestion control) has been carried out by the client. With H3 in the picture, if QUIC exposes some information about its perceived congestion, DASH/HLS clients can be modified to use that info and they will surely do a better job than the sender side.

But I want to clarify that a goal is server-side ABR, which is not necessarily the same as sender-side ABR. The problem is dumb 3rd party clients out of your control. At Twitch this was especially a problem with the iOS HLS player, but it's also a problem with clients that are difficult to update, such as smart TVs and consoles. The ability to offload ABR onto the server gives a service more control and the ability to experiment.

This, I agree with. But it is only a good option when the client cannot rate-adapt properly. It is expensive, increases server complexity and messes up a lot of things in the caches.

If you are curious enough, see our INFOCOM paper earlier this year. https://ieeexplore.ieee.org/document/10228951 pdf is attached. IEEE_INFOCOM23.pdf

Note that this also applies to broadcasting; the server doesn't want to rely on the stock client behavior. An analogy is RTMP, where the default OBS behavior was to increase the bitrate by 5% every minute after each congestion event, which is exceptionally slow. Fortunately OBS is a responsive open-source project, but we didn't have that luxury for other clients like the PlayStation broadcaster.

I think we need cooperative ABR. Both the sender and receiver need to share their view of the world somehow.

This is a cost-quality-performance trade-off. As usual, unless we are overdoing it, cooperation will always perform better.

acbegen commented 1 year ago

Here's an situation I want to address:

A broadcast client is transmitting 3 renditions via simulcast: 240p 480p 1080p. A relay server is subscribed to all of them. Viewers use ABR to subscribe to one of the renditions from the relay.

If the broadcaster encounters congestion, then the relay will unsubscribe from 1080p or somehow deprioritize it.

If the broadcaster is struggling to transmit (simulcast) 3 renditions, it should adjust itself first w/o any relay unsubscribing. Frankly, until I read this post, I did not realize you were referring to the simulcast broadcaster. In this case, the source itself needs to rate-adapt the 3 renditions, or drop one of them.

The problem is that downstream viewers will continue to fetch 1080p, despite it being starved, because the viewer's ABR algorithm did not detect last-mile congestion (via estimated network bitrate).

They should not if the broadcaster decides to drop 1080p. If it decides to drop the other two, viewers will do just fine.

Okay so let's say the viewer looks at media timestamps instead of estimated bitrate. It could correctly detect that 1080p is starving due to first-mile congestion, so it switches down.

However this doesn't work if the broadcast client is not using simulcast, but is instead transmitting 1 rendition which is then transcoded into 3 renditions on the server. First mile congestion will equally affect all renditions, so switching down from 1080p to 240p will just make the picture quality worse for no reason. This was very common on Twitch and was one of the reasons why mobile broadcasting was poor.

If the input to the transcoder deteriorates, it should be smart enough to adjust the output streams accordingly. E.g., if the input is now barely 720p, it should output 720p and lower resolutions, not anything higher.

kixelated commented 1 year ago

Yeah, the fundamental problem is that the sender and receiver have an incomplete view of the world. Whatever the approach, we can't rely on QUIC's congestion control alone for real-time media.

Again, QUIC's congestion control is only one of the signals - not just for real-time media but also for streaming.

I think this is mostly a philosophical disagreement.

The shared goal of QUIC/TCP congestion congestion, VBR, ABR is to prevent queuing. They all adjust the send rate in response to signals to keep buffer growth/bloat to a minimum. The difference is the granularity; QUIC/TCP congestion works at packet boundaries, VBR works at encoder frame boundaries, and ABR works at rendition boundaries.

The problem with treating these congestion control mechanisms as separate, independent entities is split brain. If QUIC thinks there's congestion but the ABR algorithm doesn't, the buffer grows. If the ABR algorithm thinks there's congestion but the QUIC algorithm doesn't, the network is underutilized. If neither algorithm thinks there's congestion, the buffer grows (bufferbloat).

My goal with sender-side ABR is tighter integration with the sender-side congestion control algorithms used by QUIC. The downside with receiver-side ABR is that there's an chasm; cooperation between the sender/receiver is more difficult over a congested (ie. delayed/lossy) network link. This chasm historically hasn't mattered much for ABR given the lack of granularity, but it becomes more and more important as latency is lowered.

If we look at WebRTC for inspiration, sender-side congestion control (GCC) is superior to receiver-side congestion control (REMB). I think you could implement something similar on the receiver with the right feedback, but that's a whole academic pursuit in itself.

There is tons of work in this domain. GCC is so old now, nobody cares about it anymore. There are many client-side algorithms out there, see for example: https://ieeexplore.ieee.org/document/9926128 (pdf is open access). Check Figure 2.

Microsoft will run a grand challenge on this for ACM MMSys'24.

Yeah, Twitch struggled mightily with LHLS ABR and ran an ACM grand challenge too. The signals available to the receiver were just insufficient, especially in a browser environment. The sender-side bandwidth estimate is extremely important and it needs to be in MoQ at a minimum.

I'm quite out of date on the latest algorithms in WebRTC land, but I will add that it's impossible to evaluate congestion control algorithms outside of production. It would be a huge mistake to rule out sender-side ABR in the design phase without proper experimentation.

Again, HLS/DASH does that because they work over HTTP and till recently, it meant TCP. Since TCP does not expose anything, the entire responsibility for rate adaptation (which you call congestion control) has been carried out by the client. With H3 in the picture, if QUIC exposes some information about its perceived congestion, DASH/HLS clients can be modified to use that info and they will surely do a better job than the sender side.

TCP exposes the same congestion control stats as QUIC. It has slightly worse RTT estimation but you can get the stats via a syscall (which is how CMSD works). The problem is that those stats are not exposed via HTTP or the browser, and that doesn't change with QUIC or HTTP/3.

We need absolutely something like CMSD in MoQ for receiver-side ABR, although it needs to be more frequent because per-segment granularity won't be good enough for real-time latency.

kixelated commented 1 year ago

If the broadcaster is struggling to transmit (simulcast) 3 renditions, it should adjust itself first w/o any relay unsubscribing. Frankly, until I read this post, I did not realize you were referring to the simulcast broadcaster. In this case, the source itself needs to rate-adapt the 3 renditions, or drop one of them.

Yeah that's kind of my point; simulcast very similar to sender-side ABR. The sender is in charge of which tracks get sent, the only difference being that it can choose up to N tracks instead of up to 1 track.

If the input to the transcoder deteriorates, it should be smart enough to adjust the output streams accordingly. E.g., if the input is now barely 720p, it should output 720p and lower resolutions, not anything higher.

All renditions will equally deteriorate if there's congestion first-mile when transcoding, but not when using simulcast. The viewer needs some sort of signal to know when to temporarily unsubscribe from 1080p/720p in the simulcast scenario, and when to resubscribe after recovery.

If this signal is in the catalog, it will only work first-mile. If this signal is in MoqTransport, it can work for any hop. My proposal is basically to have the receiver and sender choose a subset of tracks and the intersection is the active subscription.