Better bandwidth detection and account for multiple Players

TheModMaker commented 5 years ago

We currently use the time it takes to download a segment to detect the bandwidth. This doesn't work perfectly since we may be downloading other segments at the same time, which reduces the apparent bandwidth. If we download two segments at the same time, they both will happen in twice the time than if they were downloaded consecutively. The app may also be downloading things, or there may be multiple Players causing downloads.

We already expose the NetworkingEngine to the app so they can download things using it. We should also do the following:

[ ] Encourage apps to use NetworkingEngine more.
[ ] Allow sharing NetworkingEngine instances between Player instances.
[ ] Move bandwidth calculation to NetworkingEngine (from AbrManager).
[ ] Account for concurrent downloads when calculating bandwidth.

vaage commented 5 years ago

In previous talks there has been a lot of push back for sharing networking engine between players (#1500), but @theodab and I were working on a design to make it easier to purposely share resources between players.

If we mostly care about the bandwidth calculations, sharing the networking estimator instance would achieve that while not requiring everyone to use the same networking engine. It would also make it easier for app developers to swap out the estimation logic while not having to change much else (something that I know you @TheModMaker have always been in favor of).

For modeling bandwidth when we have concurrent downloads, we could look at updating the bandwidth estimator interface to mark the start and end of a download. That way it would know about overlapping regions and the implementation will have the option to model that differently.

joeyparrish commented 5 years ago

I'm not certain that concurrent downloads should be modeled explicitly, or that bandwidth estimates should necessarily be shared across Players. I'll elaborate with an example, starting with some assumptions, and then exploring what happens when some of those assumptions are broken.

For example, let's imagine 4 independent players playing at once. Let's make some simplifying assumptions:

they don't coordinate
they don't share a bandwidth estimator or networking engine
they all are playing content with high enough bitrates to saturate the network uplink

If everyone is playing at the limit of what is possible, then they can't buffer far ahead. If the buffering goal is never fully met, then they are each always requesting something as soon as a request & append cycle completes. There are therefore almost always 4 video segment requests at any one time. (Ignore audio, since it is very small compared to video, especially when the video quality level is high.)

In this scenario, each would build an estimate of about 25% of the actual usable bandwidth. This seems to be the obvious fair limit to what each can achieve concurrently. Then what is there to gain in this scenario by sharing an estimator or sharing a networking engine?

Now, let's assume that the content is not actually capable of saturating the network uplink. In this scenario, we may achieve our buffering goal, and therefore a given player may go idle. Here, the available bandwidth will be less consistent. When concurrent requests are happening, available bandwidth goes down, and so does the estimate. Future requests will be at a lower quality level to compensate and maintain the buffer.

What do we stand to gain here by sharing an estimator? If the estimate accounts for overlap and sums the overlapping parts, it would report something closer to the actual network bandwidth. But if we report the full bandwidth of the network while multiple players are streaming, today's AbrManager would end up choosing streams that it couldn't sustain without buffering. So this would require AbrManager to be aware of multiple players, as well.

That might be useful, but what if the players are in separate tabs? Separate domains? Sharing becomes impossible in those situations.

What if there's only one video player, but somebody else in the next room doing some other bandwidth-intensive task on the same network? We have to be able to handle this just as well, which I would argue that we do in the current throughput-based model.

What we wind up with in the general case is that these are all equivalent in terms of how we should behave:

multiple concurrent players in the same tab
multiple concurrent players in different tabs
multiple concurrent players, some of which are not Shaka Player
one player competing for bandwidth with a game, file sharing, etc

In the general case, these concurrent demands on bandwidth will result in more variability in the bandwidth estimate, since concurrency may not be consistent.

If we coordinate between player instances, we could end up with a system of staggering requests to avoid overlap and achieve more throughput overall. However, in 3/4 of the scenarios just above, we can't do that.

Instead, I would suggest that we could get more benefit from a system which models the variability in our estimate over time. If we have an inconsistent estimate because of variable concurrency in requests (from any cause), we may want to adapt to become more conservative and decay our lower-bound estimate more slowly. This would also improve another scenario where actual bandwidth varies without concurrency: a user in transit on a mobile network.

Thoughts?

vaage commented 5 years ago

@joeyparrish you raise some very interesting points and it makes me wonder if we lost sight of something, what is the purpose of bandwidth estimation in our player? Is it about effectively modeling and predicting the efficiency of our network connection or is it merely an implementation detail for how we pick which variant to play?

joeyparrish commented 5 years ago

It's true that it's a way for us to pick a variant. Picking variants is (by default) an optimization problem. We try to play the highest quality variant that we can stably play without rebuffering. A model of actual throughput is how we chose to do that in the very earliest days of Shaka Player.

The biggest advantages of our throughput-based model are that it is simple and it natually accounts for the ineffeciencies of how Shaka Player actually operates. For example, StreamingEngine doesn't request another segment until the it finishes the first one. You could achieve higher bandwidth utilization by overlapping requests, so that the TCP slow start phase (article, graph) of one segment overlaps the end of the previous segment. This could result in higher quality variants, at the cost of more complexity in the code and less wiggle room in our estimate. Since we don't actually do that, we may only be using (hypothetically) a large percentage of the real network bandwidth.

So if we found that our estimate were only 90% of actual max network bandwidth, it would mean we are only capable of utilizing the network at 90% of its capacity. The important thing, though, is that our estimate is an accurate reflection of what the code can stream without rebuffering. If we selected variants based on a magical oracle that gave us actual max bandwidth, we could wind up rebuffering. That number would not account for our own inefficiency, and therefore would probably not be achievable by us.

shaka-project / shaka-player

Better bandwidth detection and account for multiple Players #1757