shaka-project / shaka-player

JavaScript player library / DASH & HLS client / MSE-EME player
Apache License 2.0
7.19k stars 1.34k forks source link

Rapid decline in bandwidth can cause retry loop when using range request timeout #97

Closed jonoward closed 9 years ago

jonoward commented 9 years ago

Found in branch: master

Repro steps:

  1. Configure player with a relatively short range request timeout, e.g. player.setRangeRequestTimeout(5000), to set a 5 second timeout
  2. Start playback and wait until you are on a high, HD bitrate
  3. Use a bandwidth throttling tool to simulate a low bandwidth, that is too slow to download the HD segments within the 5s timeout. We found dropping to a 1Mb connection was enough to cause this
  4. This will cause every fragment request to timeout, which is retried by both shaka.util.AjaxRequest (3 times), and then continually in shaka.media.Stream, without the bitrate changing (as it is simply retrying the same URL). This will eventually exhaust the buffer, and you will enter a permanent buffer stall

So the core issue is that a timeout is treated in the same way as an error, but thinking about it, perhaps it should be treated slightly differently, as it's a signal that the selected bitrate could be too high for the users available bandwidth (it also could be an actual error, e.g. the users internet connection is dead). Whilst we could increase the timeout, having a reasonably short timeout is an advantage in avoiding a single, transient slow request from causing a buffer stall.

Do you think it would be feasible to drop the current bitrate in either AjaxRequest (probably not the right place for this) or Stream, when there is a timeout, so that each retry gets to download a smaller fragment, which will eventually be downloaded quickly enough to complete before the timeout?

joeyparrish commented 9 years ago

That is interesting. Architecturally, our intent is for all bitrate decisions to be made by the implementation of IAbrManager. I don't see an obvious path for IAbrManager to get information about timeouts, though, and I don't think I want to add complexity to the basic built-in implementation (SimpleAbrManager).

But just to make sure we're operating from a good starting place, are you certain that the settings which trigger this scenario are sound? Does your range-request timeout does not account for how long a segment is? For example, if each segment is 10s of content, and you are maxing out the user's bandwidth with that representation, you can easily expect it to take close to 10s to download that segment.

What's worse, depending on how your content is encoded, in certain scenarios the player would download a range of multiple segments in one HTTP request. So even with 5s segments, a 30s minBufferTime would require 6 to be downloaded at once on switch. (FWIW, we're changing this in ongoing work on #51 right now. Soon, there will be no more multi-segment requests.)

If you're sure your parameters make sense, I would recommend two things. First, wait for the next patch to come through on #51, then retest and see how the system behaves. Second, if the behavior is still poor, let's find a reasonable way to get timeout information to IAbrManager. That way, your custom IAbrManager implementation could use timeouts as a signal to down-shift.

jonoward commented 9 years ago

We selected a timeout value that is basically a fail-safe, more to catch out slow requests caused by CDN issues (so they can be retried quickly), rather than trying to match it too closely with the expected download times of our clients. Our segments are 2s long, our timeout 8 seconds, our top bitrate around 4.5mbps, so this means a user who is on the top bitrate will experience timeouts if their bandwidth drops to around 1.2mbps. So thinking about it, maybe we should make this a little longer, perhaps 10 seconds.

One alternative idea, if you want to avoid exposing IAbrManager to network issues - perhaps allow ABR manager to determine the buffer size, so that the buffer stops growing, and it's starting to get low (perhaps X% of the buffering goal), the ABR manager could drop down a bitrate(s) as a fail-safe to try avoid a rebuffer. I haven't thought this through completely, so there could very well be a reason why this is a bad idea.

One last thing, how hard do you think it would be to cancel XHR retries when the bitrate changes? Currently, even if ABR manager does switch bitrates, utill.AjaxRequest is still going to keep retrying, it will only be when those are exhausted and the retries in Stream.onUpdate fire, will the new bitrate be picked up. It's not necessarily a major problem, but might be nice to exist those retries earlier (should further reduce the chance of a rebuffer).

Thanks for your help on this, I'll wait until your #51 changes come up to see if all of this is still an issue, and we'll look at raising our timeout a little bit.

joeyparrish commented 9 years ago

I definitely want IAbrManager to be able to select bitrates based on buffer fullness. This could easily be done with a manager which holds a reference to the video element.

It could (conceivably) even ignore bandwidth estimates completely and only poll how far ahead of video.currentTime has been buffered according to video.buffered. The fact that we don't clear the buffer (by default) or overwrite anything (at all after the next patch) means that AbrManager could change bitrates as often as it wants without impacting streaming much.

Frequent visual quality changes are arguably a bad user experience, but the point is that if your preferred ABR algorithm wants to make bitrate decisions all the time, we should enable you to try that and not interact badly with such an algorithm at the Stream level.

As for canceling XHR, that's no problem. See shaka.util.AjaxRequest.prototype.abort() and shaka.media.SourceBufferManager.prototype.abort(). I believe calling this.sbm_.abort() in Stream.switch() would have the effect you want.

jonoward commented 9 years ago

That sounds great, if we were able to implement a custom ABR manager that had access to the videoElement/SourceBuffers, we could implement a fix for the original issue by dropping to the lowest bitrate when the buffer is about to run dry. For us, dropping quality is better than rebuffering (well, less bad), but def agree that frequent bitrate changes are a bad user experience. And thanks for the tip about sbm.abort(), we'll take a look at that

joeyparrish commented 9 years ago

In v1.4, the application would inject an IAbrManager as the fourth argument to new DashVideoSource(). At that point, you should have a video element available, so you could give the video element to the ABR manager in the manager's constructor. If you want, you could also contribute the buffer-based AbrManager implementation to live along-side SimpleAbrManager (which may not be the best name once there's something to contrast it with).

joeyparrish commented 9 years ago

v1.4.0 is out now. Please let us know whether or not the new APIs resolve this issue for you.

joeyparrish commented 9 years ago

@jonoward This issue has been stale for some time, so I'm going to assume that everything is copacetic with the new APIs we introduced in v1.4.x. Please feel free to reopen this issue or start a new one if you need more than what what we discussed in this bug.