google / ExoPlayer

An extensible media player for Android
Apache License 2.0
21.62k stars 6k forks source link

DASH Live stream - AdaptiveTrackSelection, DefaultBandwidthMeter, and isTransferAtFullNetworkSpeed #10082

Open IcemarkUK opened 2 years ago

IcemarkUK commented 2 years ago

I am tracking an issue with a DASH live stream where playback starts when the bandwidth is low ~1mbs and recovers over a few minutes but the playback quality doesn't reflect the new bandwidth.

The network recovers by ~1mbs every 30 seconds climbing up to 32mbs

However, the quality of the stream never climbs to its full quality, instead sticks at the lower end ~2mbs

While debugging this I have observed the following.

AdaptiveTrackSlection is using an implementation of a BandwidthMeter that is utilising DefaultBandwithMeter to supply the Bitrate Estimate. However I have noticed that within a short time isTransferAtFullNetworkSpeed starts returning false and therefore onTransferEnd stops monitoring the transfers and the bitrateEstimate sticks at the last value.

isTransferAtFullNetworkSpeed is returning false because the DataSpec has DataSpec.FLAG_MIGHT_NOT_USE_FULL_NETWORK_SPEED set.

The knock on effect of this is that because this bitrate is low, the recovery of the network makes no difference and playback continues at a low quality. Manually the only way to recover it is to stop playback and start it again, or some combination of pause/seek that seems to allow the DataSpec to be reevaluated and for a brief moment the transfer are monitored and allows a new bitrateEstimate to be calculated for a while before the condition because false again and monitoring stops.

If I remove this check playback performs and ladders up and down as expected.

So I'm trying to understand the conditions to how this flag gets set and reset, and if relying on the estimate from DefaultBandwidthMeter is advisable. I feel like I'm missing something important...

Comments in the commit hints at this being for detecting network throttling in order to discard these samples. However I have seen this behaviour on non throttled networks, albeit playback hits full bandwidth but sampling then stops. Trying therefore to understand is this a real issue in a live environment or only occurring because of how out throttling tests are being performed.

marcbaechinger commented 2 years ago

Sorry, for the long answer. It's complicated. First and in any case, as you are probably touching an unsolved problem we are aware of, it would be helpful for us to get a test stream or see the manifest so we get more experienced in what manifests people are playing when running into this issue. If you're unable to share test content publicly, please send them to dev.exoplayer@gmail.com using a subject in the format "Issue #10082".

Generally, a segment may not be available with full network speed, if its end is beyond the live edge.

The method that determines whether a segment should be considered available with full speed is in the DefaultDashChunkSource:

public boolean isSegmentAvailableAtFullNetworkSpeed(long segmentNum, long nowPeriodTimeUs) {
      if (segmentIndex.isExplicit()) {
        return true;
      }
      return nowPeriodTimeUs == C.TIME_UNSET || getSegmentEndTimeUs(segmentNum) <= nowPeriodTimeUs;
    }

So it looks like that when this happens, the player is playing the last available segment of the live stream and the end time of the segment is larger the live edge (nowPeriodTimeUs). The server doesn't have the full segment yet and the download speed hence is not accurate as it is not about the bandwidth but about waiting for the segment being encoded on the server completely.

So far it's easy and very clear what is happening, but it may be not that easy to fix this I'm afraid. Specifically for low-latency live streams this is an open issue in ExoPlayer and a topic that is not yet commonly solved in the industry, which is how reliable bandwidth measurements can be made when being so close to the live edge that a player is starting to read a segment that is not fully available on the server yet.

There are several options you can look into that may or may not be suitable for your use case or app:

Option 1) You can try fixing this behaviour by choosing a different targetLiveOffset when building the LiveConfiguration of your media item. If you are doing low-latency live streams or your segments are very large this may not be a good solution for you as you really want the behavior that the player starts reading such a segment. If you are fine with increasing the live latency, you can set the latency to a value that is greater than a segment duration. This should avoid this situation.

Option 2) From the pasted method isSegmentAvailableAtFullNetworkSpeed it think that you are using a DASH manifest with a segment template as opposed to a segment list (explicit). If you own the manifest you may be able to tweak the manifest. Not sure if this is feasible or can help though. I also don't know whether you are using these stream on other platforms as well, so that tweaking it for one platform may have undesired effects on other platform/player. You may be able to change the value of availabilityTimeOffset if this is greater than 0. I think in this case setting it to 0 could make it less likely that you run into this situation (see SegmentBase.getAvailableSegmentCount).

Option 3) Is probably not the right one for you as you are describing that the problem comes up when the player was having a slow network and does not recover from that. More for completion of the options that I see and for a WIFI environment like for instance Android TV, instantiating a DefaultBandwidthMeter with a custom initial bandwidth estimate may help to get above a low initial bandwidth estimate that is never corrected because of being close to the live edge. This customized instance of DefaultBandwidthMeter can then be use to build the player with ExoPlayer.Builder.

Sorry for not being able to give you an easy answer to solve this. Interested to hear what you find.

IcemarkUK commented 2 years ago

Thanks, it might take some while for me to digest and talk with some others in order for me to understand. As far as I am aware we do own and control the streams (mostly).

IcemarkUK commented 2 years ago

@marcbaechinger So out of interest - We have done some tests that involved removing the DataSpec.FLAG_MIGHT_NOT_USE_FULL_NETWORK_SPEED when it gets set and seen that out tests now see the ABR laddering as expected.

The bigger question would be, what would be the consequences of us affectively removing this check. It would be similar to having a explicit segment index - as we can't change that in the manifest - but would only be changed in a custom BandwidthMeter to ignore the flag.

This check is there for a reason... I'd like to try and understand the consequences of taking it out.

marcbaechinger commented 2 years ago

If the flag is set, the DefaultBandwidthMeter does not use these segments for estimating the bandwidth.

The consequence of not setting the flag would be that the DefaultBandwidthMeter may underestimate the actual network speed. In case the server can not deliver the requested bytes, the server needs to wait until encoding the segment has advanced that far to be able to deliver the bytes. So the DefaultBandwidthMeter would measure a download duration for the given bytes that is not accurate for estimating the bandwidth because the duration is longer for reasons different to bandwidth constraints.

IcemarkUK commented 2 years ago

If the flag is set, the DefaultBandwidthMeter does not use these segments for estimating the bandwidth.

The consequence of not setting the flag would be that the DefaultBandwidthMeter may underestimate the actual network speed. In case the server can not deliver the requested bytes, the server needs to wait until encoding the segment has advanced that far to be able to deliver the bytes. So the DefaultBandwidthMeter would measure a download duration for the given bytes that is not accurate for estimating the bandwidth because the duration is longer for reasons different to bandwidth constraints.

@marcbaechinger So, if the player is in this state then either buffering would occur? or more likely the BandwidthMeter thinks the network is slower than it actually is and therefore it's likely that the AdaptiveTrackSelector will drop down to a lower quality, when it might not actually be required.