Create an ability to pre-warm a track

wilaw commented 8 months ago

At the interim there was interest in creating the ability for a receiver to ask a sender to subscribe to a track but not to forward it. The purpose of this is to minimize the delay when the receiver SUBSCRIBES to the track in the future.

Design considerations

This would need to be converted to a regular SUBSCRIBE for forward requests. So any relay receiving a PREWARM request MUST never forward it.
Is there a matching UNPREWARM, or would a regular UNSUBSCRIBE suffice?
Do we need a new method PREWARM or can it be a flag on the existing SUBSCRIBE?
This is a powerful DDOS attack vector. How do we mitigate that threat in the design?

ianswett commented 8 months ago

I find this concerning from a DDoS/memory exhaustion perspective.

Is the motivation for this to minimize the latency of switching resolutions for video playback? If so, couldn't you just issue a SUBSCRIBE at a lower priority and if it starts coming quickly, then start switching relatively soon, and if it takes a while, wait to switch. Or is there a different usecase?

wilaw commented 8 months ago

Is the motivation for this to minimize the latency of switching resolutions for video playback?

Yes.

If so, couldn't you just issue a SUBSCRIBE at a lower priority and if it starts coming quickly, then start switching relatively soon, and if it takes a while, wait to switch

That would work for switching up, where you have the luxury of time (your switch can always occur later). However the primary use case is in switching down, where you don't want to incur in the extra delay in the relay having to go back to origin to retrieve the new track. It also is an optimization when you have only 1 or a few clients subscribed to that relay for that presentation. Given a sufficiently large cohort of clients, the relay would be naturally warmed.

I find this concerning from a DDoS/memory exhaustion perspective.

I agree on the DDOS concern, as the OP states. However, from a memory exhaustion, this is really no different from having a multiplicity of clients each pulling a different track from the relay. 1 client pulling 3 tracks is the same relay memory requirement as 100 clients asking for those same 3 tracks.

gwendalsimon commented 8 months ago

I find this concerning from a DDoS/memory exhaustion perspective.

+1 on the concern. The risk of overloading the CDN backbone unnecessarily is significant. Furthermore, any just-in-time processing (transcoding and packaging) would be useless although these are not only money but also energy reduction

ianswett commented 8 months ago

Based on discussion, it seems the hope is to prewarm a resolution lower than the current one, so if a player needed to switch to a lower quality, it would succeed quickly. One question is whether this needs to live in MOQT.

I can imagine a CDN doing the work to parse catalogs (or use ML, etc) and understand what the next lowest resolution track is, and then decide whether to proactively SUBSCRIBE to it.

In my mind, MOQT is about the intent of the subscriber, and indicating "I might want this track, but not now" is a fairly weak signal. Admittedly, if we had a more complex subscribe that included multiple qualities, then the full intent would be indicated.

I would like to park this until we have a better handle on SUBSCRIBE, unless there's a need to resolve it earlier.

hardie commented 8 months ago

On Fri, Feb 9, 2024 at 7:05 AM ianswett @.***> wrote:

Based on discussion, it seems the hope is to prewarm a resolution lower than the current one, so if a player needed to switch to a lower quality, it would succeed quickly. One question is whether this needs to live in MOQT.

I can imagine a CDN doing the work to parse catalogs (or use ML, etc) and understand what the next lowest resolution track is, and then decide whether to proactively SUBSCRIBE to it.

We've assumed so far that the relays don't have access to the catalog (for privacy reasons as well as practical reasons). Revisiting that decision for this would seem like a big step backwards.

I think one of the issues here is that there are a flock of cases where there is no pre-warming needed, because the media will be popular enough that a relay will have multiple alternates on hand for all but a few of the earliest subscribers. There are also cases where the publisher is CPU or bandwidth limited (like a cell phone in an RTC videoconference) where it is impractical to expect that they are producing a lot of different resolutions. Those two being very common cases makes it difficult to see the utility here.

The question is whether there is a substantial body of cases where the publisher is producing multiple streams but the number of consumers is low enough that a relay won't have most of them on hand. Luke's experience at Twitch suggests that this does happen, and it happens fairly often. For that case, I think we could either presume that the client subscribes to multiple resolutions and unsubscribes to higher resolutions as bandwidth or other bounds interfere or we could introduce something like a SUBSCRIBE with a flag that indicates that no objects are wanted at this time. That might be as simple as the same beginning and end object being named. Whatever the syntax, that could be taken by a relay to mean that pre-warming the empty subscribed tracks would be useful, since they may be wanted later. The decision about whether to issue the upstream subscribe would be a combination of business logic and local capacity, so the receiver will always have to deal with a potential delay if it changes the track to which it is subscribed.

In my mind, MOQT is about the intent of the subscriber, and indicating "I might want this track, but not now" is a fairly weak signal. Admittedly, if we had a more complex subscribe that included multiple qualities, then the full intent would be indicated.

I would like to park this until we have a better handle on SUBSCRIBE, unless there's a need to resolve it earlier.

I'm okay parking it as long as we indicate in the draft that SUBSCRIBE's syntax is still under development and that new signals may be added.

— Reply to this email directly, view it on GitHub https://github.com/moq-wg/moq-transport/issues/372#issuecomment-1935429421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKVXZBQXWBMOZROO26646DYSXDC3AVCNFSM6AAAAABDAKA6RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVGQZDSNBSGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

wilaw commented 8 months ago

One of the largest problems in media delivery today is sparse clients consuming uncached edge content. This problem is exacerbated at lower latencies, especially where target latency is a few multiples of RTT-to-origin. This is exactly the playground MoQ want to play in. I believe it will be very difficult to build working ABR solutions at sub 350ms latency if a switch will entail a 100ms round trip to origin.

In my mind, MOQT is about the intent of the subscriber, and indicating "I might want this track, but not now" is a fairly weak signal.

I would argue that "I will desperately need this track in a hurry if I run in to delivery problems and I really, really need you to keep it warm at the edge" is a pretty strong signal.

I can imagine a CDN doing the work to parse catalogs (or use ML, etc) and understand what the next lowest resolution track is, and then decide whether to proactively SUBSCRIBE to it.

The relays cannot read the catalogs, so this is not an option. The intelligence about track offerings resides in the client.

The risk of overloading the CDN backbone unnecessarily is significant.

I don't think it is significant. If 5 clients arrive and happen to subscribe to 5 different tracks in a given presentation, we would call that normal playback and would expect the backbone to easily handle it.

The question is whether there is a substantial body of cases where the publisher is producing multiple streams but the number of consumers is low enough that a relay won't have most of them on hand. Luke's experience at Twitch suggests that this does happen, and it happens fairly often.

And my experience at Akamai says exactly the same thing. It happens with live sports all the time. With highly distributed edges, the problem isn't that there aren't a lot of aggregate viewers, but that there a sparse viewers on certain edges (such as ones inside smaller ISPs, or rural areas).

I think parking at this point is unnecessarily conservative. I don't see a future where every track of every Moq presentation is prewarmed. Instead I see cases where

You only prewarm the lower bitrates of any given stack, or even just the lowest.
You only do it for clients where the ratio of player buffer/RTT-to-origin is small. The player knows its buffer and the server could reports the distance to origin in the SUBSCRIBE_OK response, so the client could intelligently decide if prewarming would be useful.
The delivery network will charge more for the increase in midgress traffic. Economics tend to tamper abuse. If you are delivering the Superbowl (Go SF) you will happily pay for the better QoE.
The delivery network can always choose not to honor prewarming and we can design a response to the client indicating this.

I would suggest we build a simple version of it into the protocol and then build POCs where we test and measure the real-world utility and cost.

gwendalsimon commented 7 months ago

The risk of overloading the CDN backbone unnecessarily is significant.

I don't think it is significant. If 5 clients arrive and happen to subscribe to 5 different tracks in a given presentation, we would call that normal playback and would expect the backbone to easily handle it.

One use case does not make large probabilistic numbers; of course we will handle these five clients, but this case has low probability... Network infrastructures are not dimensioned to meet the worst possible case of traffic matrix but rather some probabilistic maximum traffic.

As discussed during the interim, dimensioning a backbone (which often includes peering link and IXP) that can handle actual traffic + pre-warmed traffic is just a business question. But I can also argue that, if pre-warmed tracks are not consumed, it is a net waste of network resources, which also translates into a waste of energy. Designing a protocol that deliberately wastes resources to address a few 250ms freezes (from too aggressive video players) is questionable in 2024.

The question is whether there is a substantial body of cases where the publisher is producing multiple streams but the number of consumers is low enough that a relay won't have most of them on hand. Luke's experience at Twitch suggests that this does happen, and it happens fairly often.

And my experience at Akamai says exactly the same thing. It happens with live sports all the time. With highly distributed edges, the problem isn't that there aren't a lot of aggregate viewers, but that there a sparse viewers on certain edges (such as ones inside smaller ISPs, or rural areas).

My experience at Synamedia (where we run multiple Edge-CDN (a.k.a multi-Tbps Telco-CDN) for broadcasters and Pay-TV services) says exactly the same. I would add two cases: 1/ Long-tail unpopular TV channels have usually the same bitrate ladder as any other TV channel (e.g. 8 video profiles), although barely nobody watch them. In our measures, we can have more than half of segments that are not consumed at all over a week. Of course even more at the edges. 2/ In countries with very good network connections, some edges serve a population having a bandwidth five to ten times bigger than the highest video profile bit-rate. In that case, all requests get to the highest track, regardless of the number of consumers.

I think parking at this point is unnecessarily conservative. I don't see a future where every track of every Moq presentation is prewarmed. Instead I see cases where

You only prewarm the lower bitrates of any given stack, or even just the lowest.

+1 It would be a "backup survival track", which serves only to implement a smooth switch down (to preserve the connection of the consumer in trouble) and give time to the relay for fetching the actual track requested by the consumer from the Origin. The lowest track has a marginal bit-rate, so the waste is minimized.

However, the relay does not know which track is the survival track in the catalog since it cannot read the catalog. We may add a "survival flag" in the track saying "I'm the lowest one, please feel free to pre-warmly subscribe to me".

You only do it for clients where the ratio of player buffer/RTT-to-origin is small. The player knows its buffer and the server could reports the distance to origin in the SUBSCRIBE_OK response, so the client could intelligently decide if prewarming would be useful.

It seems a bit over-engineered to me. The distance to Origin can vary, especially if Origin is mobile.

The delivery network can always choose not to honor prewarming and we can design a response to the client indicating this.

+1

I would suggest we build a simple version of it into the protocol and then build POCs where we test and measure the real-world utility and cost.

I'd love to help. Measuring real-world utility and cost will probably be challenging, since it requires building real-world use-cases. For example, it depends on the considered ABR logic at the client side. I still doubt that a player can be so wrong in his estimation that he cannot survive 500ms more with his previous track before switching down. But it is an exciting study to do.

kixelated commented 7 months ago

I think this a good idea, and yet also a premature optimization.

Here's my current thoughts on how the receiver should perform ABR. These examples use inclusive ranges and the track priority is annotated when it matters.

Switching Up

I want the ability the preflight the switch while in the middle of group 5 @ 480p.

SUBSCRIBE 1080p start=6
UNSUBSCRIBE 480p end=5

Or I want the ability to probe for the higher bitrate, temporarily subscribing to both to test the network speed. I'll unsubscribe to one of them depending on the arrival speed.

SUBSCRIBE 480p start=0 priority=high
SUBSCRIBE 1080p start=6 priority=low

In both cases the receiver doesn't need to pre-warm unless they switch ~100ms from the end of a group, in which case they should wait.

Switching Down

I want to immediately switch down while halfway through group 5 @ 1080p:

UNSUBSCRIBE 1080p end=5 priority=low
SUBSCRIBE 480p start=5 priority=high

In this example, I'll keep receiving 1080p while the request to 480p goes to origin.

Pre-warming could reduce the time this takes by 100ms at most, assuming it's an unpopular broadcast. I can still download some 1080p with my limited bandwidth, so let's say we drop like 50ms of video by not pre-warming.

But does that really matter? Is it worth the additional cost? Maybe for switching down but definitely not for switching up.

kixelated commented 7 months ago

My take is that pre-warming should be part of sender-side ABR. If it's critical to avoid the RTT to origin, then it's also critical to avoid the RTT incurred by putting the receiver in charge.

It also just makes for a better API, as the server can decide if it wants to transparently pre-warm instead of relying on a benevolent client:

SUBSCRIBE 1080p or 480p or 240p

ianswett commented 7 months ago

I think this a good idea, and yet also a premature optimization.

But does that really matter? Is it worth the additional cost? Maybe for switching down but definitely not for switching up.

Thanks @kixelated this post summarizes my thinking well. I'd like to get the features of SUBSCRIBE/UNSUBSCRIBE/UPDATE_SUBSCRIBE working well enough that we can do all of these and then see what the outcome is. We're missing priority on subscriptions right now, and that means we can't do what you describe above for client side ABR.

On Server Side ABR, I'm realizing that it might look very different for live and VoD. A moqt relay won't know if the client is receiving bytes fast enough to avoid buffer underruns for a VoD playback. For live at head, it knows approximately how far the delivered content is behind head. That might also argue for an explicit communication of what the client's time sensitivity is. A server needs to be much faster to downswitch at a 300ms jitter buffer vs 5 seconds.

kixelated commented 7 months ago

On Server Side ABR, I'm realizing that it might look very different for live and VoD. A moqt relay won't know if the client is receiving bytes fast enough to avoid buffer underruns for a VoD playback. For live at head, it knows approximately how far the delivered content is behind head. That might also argue for an explicit communication of what the client's time sensitivity is. A server needs to be much faster to downswitch at a 300ms jitter buffer vs 5 seconds.

Yeah, that's what I thought too. I implemented an ABR algorithm that took into account the buffer size... but it didn't end up helping. I'm not saying it's not a good signal, just that I tried it and switched to something else.

My final logic was much simpler: round down the estimated bitrate to the nearest rendition at each group boundary. And it makes sense in hind-sight: the jitter buffer size is the receiver's tolerance for over-estimations, not something that should be intentionally drained.

zafergurel commented 7 months ago

What about for a video conferencing application where the latency requirement is below 150ms. For example, there may be two different layouts: speaker and presentation. In the speaker layout, a high resolution track is subscribed. When the layout is changed to the presentation, the client switches to the low resolution track. There is no ABR here, client decides which track to subscribe depending on an application-specific use case. So, for such a case, I would want to pre-warm the lowest resolution track.

kixelated commented 7 months ago

One complication is that the player SHOULD NOT assume a track has been pre-warmed, since the CDN can ignore the request. The player still has to perform the same seamless switching down/up logic, but it might be slightly faster if the pre-warm went through.

fluffy commented 7 months ago

Just want to note that for things like webex I would 1) expect to pay the CDN based on how much data is cached and how long it is stored as well as ingress and egress bandwidth 2) all of this will be authenticated so the CDN knows who to bill.

moq-wg / moq-transport

Create an ability to pre-warm a track #372

Switching Up

Switching Down