DASH host failover behavior

joeyparrish commented 3 years ago

Hi everyone,

We'd like your feedback on the way Shaka Player fails over between hosts when there are multiple BaseURLs in a DASH manifest. (We don't properly support the equivalent in HLS yet, so for now, we focus on the DASH version in this conversation.)

Currently, the BaseURLs are used to resolve 1 or more absolute URIs for each segment. So if a segment request fails, we switch to the next URL for that segment. This failover happens only at the segment level.

We believe this is not ideal for many reasons. If a segment request fails because a host fails, we should avoid that host when we request the next segment. But today, we start at URL 0 for each new segment we want to fetch, and only use URL 1 when URL 0 fails. Similarly, we only use URL 2 when URL 0 and 1 fail, etc.

One idea to improve this is to make the URL index "sticky" between segment requests. For example, if we request URL 0 and it fails, then we request URL 1 and it succeeds, then when we move to the next segment, we would try that segment's URL 1 first. In this way, we would cycle through hosts on failure. This would potentially have a positive impact on CDN performance. Since an edge node could prefetch the next segment from the origin server for long tail content, sticking to the same host from one segment to the next would be a win for fetch latency.

Another idea is that if the primary goal of multiple hosts is to load balance, we could choose a random initial index. Each end-user would start on a randomly-assigned host. When a request fails, we would choose another random host next, spreading the load out evenly from the failed host.

We are also wondering if multiple hosts are generally thought of as primary/secondary/tertiary, in which case randomly spreading users over those hosts would actually be inappropriate.

So, what do you think?

Should we make a choice of host sticky across segments?
Should we start at URL 0 (assume primary/secondary hosts), or at a random index (assume load-balancing hosts)?
Should we make random vs. index 0 configurable?
Anything else we should consider?

Thanks for your feedback!

kocoten1992 commented 3 years ago

Hello, always random is a no go!

Video is network business (aka: datacenter location matter), imagine we have 2 datacenters each at east and west coast. If we are closer to the west, we would prio west url until we sure west DC is down and then only use east from that point. (maybe even sometimes retry west)

riksagar commented 3 years ago

It seems like there’s some merit to the “sticky index”. I know we list the CDNs in priority order, so would expect the first one to be used unless it’s unavailable.

Some thoughts on other use cases. Maybe provide an API to preset the “sticky index”. That would facilitate behaviors like:

A service wanting to select one from the list at random.
A service using proprietary logic to determine a preferred CDN (custom MPD attribute, or list ordering convention) Could be done as an interceptor on a per segment basis, which would offer more flexibility to adapt to dynamic network conditions.

Rik.

joeyparrish commented 3 years ago

Okay, so the consensus so far is:

"Sticky" index is good,
Don't randomly select an index,
Maybe allow apps to configure a starting index, which default to 0

(Apps can already manipulate host selection per-segment using a request filter, by modifying the request to have only one URL.)

avelad commented 11 months ago

I think that with Content Steering this problem is solved, what do you think?

shaka-bot commented 11 months ago

Closing due to inactivity. If this is still an issue for you or if you have further questions, the OP can ask shaka-bot to reopen it by including @shaka-bot reopen in a comment.

shaka-project / shaka-player

DASH host failover behavior #2975