google / ExoPlayer

This project is deprecated and stale. The latest ExoPlayer code is available in https://github.com/androidx/media
https://developer.android.com/media/media3/exoplayer
Apache License 2.0
21.7k stars 6.01k forks source link

Improve stuck buffering detection to explicitly detect that no buffer is available for rendering #9553

Open vovan888 opened 2 years ago

vovan888 commented 2 years ago

When playing a specific HLS VOD stream the playback always stalls (video freezes, audio stops, CC stops) at the same time and at the same frame (around the 8 minutes from the beginning). Pausing/unpausing does not help. Seeking helps to resume playback.

I tried to debug the issue after playback stops: LoadControl reports that the media buffer is full (at 50seconds). But both Video and Audio renderers report that they are not ready in the following line:

ExoPlayerImplInternal.java:

        boolean allowsPlayback =
            isReadingAhead || isWaitingForNextStream || renderer.isReady() || renderer.isEnded();

VLC and ffplay play this stream without issues.

Unfortunately I cannot provide a link to the stream. But it looks like the following:

#EXTM3U
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-VERSION:3
#EXT-X-START:TIME-OFFSET=0
#EXT-X-TARGETDURATION:17
#EXT-X-MEDIA-SEQUENCE:11048
#EXT-X-KEY:METHOD=AES-128,URI="[edit]"
#EXTINF:8.341667,
11048.ts
.......
#EXTINF:8.341667,
11090.ts
#EXTINF:16.683333,
11091.ts
#EXTINF:16.683333,
11092.ts
#EXTINF:8.341667,
11093.ts

Logcat output:


2021-10-10 22:21:26.503 28898-28898/com.player D/EventLogger: downstreamFormat [eventTime=6.90, mediaPos=0.00, window=0, period=0, id=0, mimeType=null, bitrate=3500000, codecs=mp4a.40.2, avc1.4d401e, res=1280x720]
2021-10-10 22:21:26.913 28898-28898/com.player D/EventLogger: videoDecoderInitialized [eventTime=7.29, mediaPos=0.00, window=0, period=0, OMX.rk.video_decoder.hevc]
2021-10-10 22:21:26.934 28898-28898/com.player D/EventLogger: videoInputFormat [eventTime=7.33, mediaPos=0.00, window=0, period=0, id=0, mimeType=video/hevc, bitrate=3500000, codecs=hvc1.1.2.L93, res=1280x720]
2021-10-10 22:21:27.006 28898-28898/com.player D/EventLogger: audioDecoderInitialized [eventTime=7.41, mediaPos=0.00, window=0, period=0, OMX.google.aac.decoder]
2021-10-10 22:21:27.032 28898-28898/com.player D/EventLogger: audioInputFormat [eventTime=7.42, mediaPos=0.00, window=0, period=0, id=1/257, mimeType=audio/mp4a-latm, codecs=mp4a.40.2, channels=2, sample_rate=48000, language=en]
2021-10-10 22:21:27.190 28898-28898/com.player D/EventLogger: tracks [eventTime=7.58, mediaPos=0.00, window=0, period=0
2021-10-10 22:21:27.191 28898-28898/com.player D/EventLogger:   MediaCodecVideoRenderer [
2021-10-10 22:21:27.191 28898-28898/com.player D/EventLogger:     Group:0, adaptive_supported=N/A [
2021-10-10 22:21:27.193 28898-28898/com.player D/EventLogger:       [X] Track:0, id=0, mimeType=video/hevc, bitrate=3500000, codecs=hvc1.1.2.L93, res=1280x720, supported=YES
2021-10-10 22:21:27.197 28898-28898/com.player D/EventLogger:     ]
2021-10-10 22:21:27.207 28898-28898/com.player D/EventLogger:     Metadata [
2021-10-10 22:21:27.207 28898-28898/com.player D/EventLogger:       HlsTrackMetadataEntry
2021-10-10 22:21:27.207 28898-28898/com.player D/EventLogger:     ]
2021-10-10 22:21:27.208 28898-28898/com.player D/EventLogger:   ]
2021-10-10 22:21:27.208 28898-28898/com.player D/EventLogger:   MediaCodecAudioRenderer [
2021-10-10 22:21:27.209 28898-28898/com.player D/EventLogger:     Group:0, adaptive_supported=N/A [
2021-10-10 22:21:27.212 28898-28898/com.player D/EventLogger:       [X] Track:0, id=1/257, mimeType=audio/mp4a-latm, codecs=mp4a.40.2, channels=2, sample_rate=48000, language=en, supported=YES
2021-10-10 22:21:27.212 28898-28898/com.player D/EventLogger:     ]
2021-10-10 22:21:27.212 28898-28898/com.player D/EventLogger:     Group:1, adaptive_supported=N/A [
2021-10-10 22:21:27.213 28898-28898/com.player D/EventLogger:       [ ] Track:0, id=1/258, mimeType=audio/mp4a-latm, codecs=mp4a.40.2, channels=2, sample_rate=48000, language=enm, supported=YES
2021-10-10 22:21:27.230 28898-28898/com.player D/EventLogger:     ]
2021-10-10 22:21:27.230 28898-28898/com.player D/EventLogger:   ]
2021-10-10 22:21:27.231 28898-28898/com.player D/EventLogger:   FfmpegAudioRenderer []
2021-10-10 22:21:27.231 28898-28898/com.player D/EventLogger:   TextRenderer [
2021-10-10 22:21:27.232 28898-28898/com.player D/EventLogger:     Group:0, adaptive_supported=N/A [
2021-10-10 22:21:27.233 28898-28898/com.player D/EventLogger:       [X] Track:0, id=cc1:English, mimeType=application/cea-608, language=en, supported=YES
2021-10-10 22:21:27.233 28898-28898/com.player D/EventLogger:     ]
2021-10-10 22:21:27.240 28898-28898/com.player D/EventLogger:   ]
2021-10-10 22:21:27.241 28898-28898/com.player D/EventLogger:   MetadataRenderer [
2021-10-10 22:21:27.241 28898-28898/com.player D/EventLogger:     Group:0, adaptive_supported=N/A [
2021-10-10 22:21:27.257 28898-28898/com.player D/EventLogger:       [X] Track:0, id=null, mimeType=application/id3, supported=YES
2021-10-10 22:21:27.258 28898-28898/com.player D/EventLogger:     ]
2021-10-10 22:21:27.258 28898-28898/com.player D/EventLogger:   ]
2021-10-10 22:21:27.261 28898-28898/com.player D/EventLogger:   CameraMotionRenderer []
2021-10-10 22:21:27.262 28898-28898/com.player D/EventLogger: ]
2021-10-10 22:21:27.542 28898-28898/com.player D/EventLogger: videoSize [eventTime=7.93, mediaPos=0.00, window=0, period=0, 1280, 720]
2021-10-10 22:21:27.581 28898-28898/com.player D/EventLogger: renderedFirstFrame [eventTime=7.96, mediaPos=0.00, window=0, period=0, Surface(name=null)/@0xcfe893f]
2021-10-10 22:21:28.426 28898-28898/com.player D/EventLogger: state [eventTime=8.81, mediaPos=0.02, window=0, period=0, READY]
2021-10-10 22:21:28.465 28898-28898/com.player D/EventLogger: isPlaying [eventTime=8.84, mediaPos=0.02, window=0, period=0, true]
2021-10-10 22:21:32.503 28898-28898/com.player D/EventLogger: state [eventTime=12.91, mediaPos=3.98, window=0, period=0, BUFFERING]
2021-10-10 22:21:32.507 28898-28898/com.player D/EventLogger: isPlaying [eventTime=12.91, mediaPos=3.99, window=0, period=0, false]
2021-10-10 22:33:23.596 28898-28898/com.player D/EventLogger: state [eventTime=724.00, mediaPos=437.29, window=0, period=0, BUFFERING]
2021-10-10 22:33:23.599 28898-28898/com.player D/EventLogger: isPlaying [eventTime=724.01, mediaPos=437.29, window=0, period=0, false]
2021-10-10 22:33:23.623 28898-28898/com.player E/EventLogger: audioTrackUnderrun [eventTime=724.03, mediaPos=437.29, window=0, period=0, 53664, 279, 333]
2021-10-10 22:33:30.539 28898-28898/com.player D/EventLogger: state [eventTime=730.95, mediaPos=437.29, window=0, period=0, READY]
2021-10-10 22:33:30.539 28898-29480/com.player D/AudioTrack: correcting timestamp time for pause, currentTimeNanos: 353884148423295 < limitNs: 353890873556812 < mStartNs: 353890953556812
2021-10-10 22:33:30.540 28898-28898/com.player D/EventLogger: isPlaying [eventTime=730.95, mediaPos=437.29, window=0, period=0, true]
2021-10-10 22:33:30.550 28898-29480/com.player D/AudioTrack: correcting timestamp time for pause, currentTimeNanos: 353884148423295 < limitNs: 353890873556812 < mStartNs: 353890953556812
2021-10-10 22:33:30.620 28898-29480/com.player D/AudioTrack: correcting timestamp time for pause, currentTimeNanos: 353884148423295 < limitNs: 353890873556812 < mStartNs: 353890953556812
2021-10-10 22:33:30.631 28898-29480/com.player D/AudioTrack: correcting timestamp time for pause, currentTimeNanos: 353884148423295 < limitNs: 353890873556812 < mStartNs: 353890953556812
2021-10-10 22:33:30.642 28898-29480/com.player D/AudioTrack: correcting timestamp time for pause, currentTimeNanos: 353884148423295 < limitNs: 353890873556812 < mStartNs: 353890953556812
2021-10-10 22:33:30.653 28898-29480/com.player D/AudioTrack: correcting timestamp time for pause, currentTimeNanos: 353884148423295 < limitNs: 353890873556812 < mStartNs: 353890953556812
2021-10-10 22:33:30.673 28898-29480/com.player D/AudioTrack: correcting timestamp time for pause, currentTimeNanos: 353884148423295 < limitNs: 353890873556812 < mStartNs: 353890953556812
2021-10-10 22:33:30.684 28898-29480/com.player W/AudioTrack: getTimestamp() location moved from kernel to server
2021-10-10 22:33:32.877 28898-28898/com.player D/EventLogger: state [eventTime=733.28, mediaPos=439.53, window=0, period=0, BUFFERING]
2021-10-10 22:33:32.878 28898-28898/com.player D/EventLogger: isPlaying [eventTime=733.28, mediaPos=439.53, window=0, period=0, false]
2021-10-10 22:33:33.302 28898-28898/com.player E/EventLogger: audioTrackUnderrun [eventTime=733.71, mediaPos=439.53, window=0, period=0, 53664, 279, 723]
2021-10-10 22:33:40.374 28898-28898/com.player D/EventLogger: state [eventTime=740.78, mediaPos=439.53, window=0, period=0, READY]
2021-10-10 22:33:40.376 28898-28898/com.player D/EventLogger: isPlaying [eventTime=740.78, mediaPos=439.53, window=0, period=0, true]
2021-10-10 22:33:40.377 28898-29480/com.player D/AudioTrack: correcting timestamp time for pause, currentTimeNanos: 353893827568101 < limitNs: 353900708589300 < mStartNs: 353900788589300
2021-10-10 22:33:40.471 28898-29480/com.player D/AudioTrack: correcting timestamp time for pause, currentTimeNanos: 353893827568101 < limitNs: 353900708589300 < mStartNs: 353900788589300
2021-10-10 22:33:40.490 28898-29480/com.player D/AudioTrack: correcting timestamp time for pause, currentTimeNanos: 353893827568101 < limitNs: 353900708589300 < mStartNs: 353900788589300
2021-10-10 22:33:40.501 28898-29480/com.player D/AudioTrack: correcting timestamp time for pause, currentTimeNanos: 353893827568101 < limitNs: 353900708589300 < mStartNs: 353900788589300
2021-10-10 22:33:40.512 28898-29480/com.player W/AudioTrack: getTimestamp() location moved from kernel to server
2021-10-10 22:33:41.859 28898-28898/com.player D/EventLogger: loading [eventTime=742.27, mediaPos=440.90, window=0, period=0, false]
2021-10-10 22:33:42.953 28898-28898/com.player D/EventLogger: state [eventTime=743.36, mediaPos=441.98, window=0, period=0, BUFFERING]
2021-10-10 22:33:42.955 28898-28898/com.player D/EventLogger: isPlaying [eventTime=743.36, mediaPos=441.98, window=0, period=0, false]
<hangs here>
claincly commented 2 years ago

There can be multiple reasons that a renderer is not ready, for example, the ts file is malformed and the renderer receives no data. It is very hard to pinpoint the issue without a test link.

If you're unable to share bug reports or test content publicly, please send them to dev.exoplayer@gmail.com using a subject in the format "Issue #1234" ("#1234" is replaced with your issue number). Please also update this issue to indicate you've done this.

google-oss-bot commented 2 years ago

Hey @vovan888. We need more information to resolve this issue but there hasn't been an update in 14 weekdays. I'm marking the issue as stale and if there are no new updates in the next 7 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

google-oss-bot commented 2 years ago

Since there haven't been any recent updates here, I am going to close this issue.

@vovan888 if you're still experiencing this problem and want to continue the discussion just leave a comment here and we are happy to re-open this.

vovan888 commented 2 years ago

Still occurs on ExoPlayer 2.16.1.

Please check the dev.exoplayer@gmail.com

icbaker commented 2 years ago

I'm afraid we're unable to share the information requested over email - the content needs to be publicly available to ensure it's accessible by all relevant members of the team, to enable us to investigate the issue effectively.

vovan888 commented 2 years ago

Please check the dev.exoplayer@gmail.com, I have sent the link to the test stream.

icbaker commented 2 years ago

Thanks for the test stream. I played it using the 2.16.1 demo app on a Pixel 3 XL running Android 9. I played it from the beginning and it's still going after 20 minutes (past the expected 18 minute stall point mentioned in the email).

I also tried on a few other devices (Nokia 6 on Android 8 and Pixel 5 on Android 11) and both played past the 18 minute stall point.

Please provide any further info which might help us to reproduce the problem.

vovan888 commented 2 years ago

Thank you for your help. I have found the issue does not occur if I comment the setBufferDurationsMs() - so DefaultLoadControl uses the default buffer sizes:

            DefaultLoadControl loadControl = new DefaultLoadControl.Builder()
                    .setAllocator(new DefaultAllocator(true, C.DEFAULT_BUFFER_SEGMENT_SIZE))
//                    .setBufferDurationsMs(24000, 24000, 1500, 2000)
                    .setPrioritizeTimeOverSizeThresholds(true)
                    .build();

Also I have tested that downgrading ExoPlayer does not solve the issue (at least down to 2.11.7)

Increasing buffer size to 32000 resolves the issue: .setBufferDurationsMs(32000, 32000, 1500, 2000)

tonihei commented 2 years ago

The problem is even easier to reproduce when the buffer durations are set to a really low value (e.g. setBufferDurationsMs(2000, 2000, 1500, 2000)). I can reproduce after ~1 minute of playback with these values.

The underlying problem seems to be that the HLS source reports a buffered duration that doesn't match the actually loaded samples. So the player thinks the source has sufficient buffers loaded for playback (and doesn't continue loading), but actually there are no samples available for playback and the player gets stuck. Our stuck buffering detection doesn't detect this either because it only treats the player as stuck if the reported buffered duration is almost zero (which isn't the case here).

tonihei commented 2 years ago

The reason this is happening as described above is that the declared segment durations of some segments in the HLS playlist don't really match the actual duration of the media. Over time we accumulate an error between the calculated position of a chunk and the actual media timestamps. Once the accumulated error is above the maximum buffer configured in the LoadControl, the player gets stuck.

I'll close this as bad media because the HLS spec clearly states that "Durations SHOULD be decimal-floating-point, with enough accuracy to avoid perceptible error when segment durations are accumulated. "

vovan888 commented 2 years ago

I agree that the bad media should be fixed but IMHO exoplayer should at least report an error in bad media case.

tonihei commented 2 years ago

Yes, that's true, we usually try to fail with some error so that the app can at least show a UI indication that playback isn't possible. We also have a stuck-buffering detection for this reason, but it only works if the calculated buffered duration is empty to avoid triggering for other reasons (like initializing DRM playback). See https://github.com/google/ExoPlayer/blob/029a2b27cbdc27cf9d51d4a73ebeb503968849f6/library/core/src/main/java/com/google/android/exoplayer2/ExoPlayerImplInternal.java#L1045

To fix this, we need to replace our buffered duration check by something that explicitly checks for "no buffers available for rendering". However, this isn't easy to implement because we have no methods available to detect this condition at the moment and we'd need some kind of reason flag for why a renderer is not ready. I'll mark it as a low-priority enhancement to improve the stuck-buffering detection in this way.

kshailes94 commented 2 years ago

can we do anything other than increasing buffer size as increasing buffersize harms video start duration? @tonihei

tonihei commented 2 years ago

increasing buffer size as increasing buffersize harms video start duration?

The maximum buffer size is completely independent from the delay until video starts as playback can start as soon as DefaultLoadControl.minBufferForPlayback is filled.

kshailes commented 2 years ago

The reason this is happening as described above is that the declared segment durations of some segments in the HLS playlist don't really match the actual duration of the media. Over time we accumulate an error between the calculated position of a chunk and the actual media timestamps. Once the accumulated error is above the maximum buffer configured in the LoadControl, the player gets stuck.

I'll close this as bad media because the HLS spec clearly states that "Durations SHOULD be decimal-floating-point, with enough accuracy to avoid perceptible error when segment durations are accumulated. "

@tonihei how can I check the error becoming more than max buffer size ? basically, how to calculate (or get) the error between calculated position of chunk and actual media timestamps ?

tonihei commented 2 years ago

There is no easy way to access this information from the outside of the player I believe.

kshailes commented 1 year ago

@tonihei this same issue can happen for DASH also right ? and keeping bigger maxbuffer is also failing for me. I kept maxbuffer as 2 minutes, still facing same issue. Any suggestion ?

stevemayhew commented 1 year ago

The reason this is happening as described above is that the declared segment durations of some segments in the HLS playlist don't really match the actual duration of the media

This is exactly correct, and very common with origin servers for audio segments.

I'll close this as bad media because the HLS spec clearly states that "Durations SHOULD be decimal-floating-point, with enough accuracy to avoid perceptible error when segment durations are accumulated. "

Long running live makes this difficult for the origin servers to achieve, but not impossible. In tunneled playback (where platform code performs A/V sync) the issues caused this are far worse, even a few seconds duration difference will cause stalls.

This is possibly a related issue: https://github.com/google/ExoPlayer/issues/8959. Also have a look at this issue https://github.com/google/ExoPlayer/issues/7030 (on my plate for way to long, no looks like on yours @tonihei ;-) ) I was proposing the issue be simply reported as a playback error (now PlaybackException) so the UI could recover with a simple re-prepare(). As I remember @ojw28 was pushing to keep recover in the HLS layers. The code is merged in commented out, but I suspect if you turn it on with the sample streams you will see one or the other demuxed sample stream log the error.

Very happy to work with you on this, it occurs for us but rarely now days.

tonihei commented 1 year ago

Long running live makes this difficult for the origin servers to achieve, but not impossible. In tunneled playback (where platform code performs A/V sync) the issues caused this are far worse, even a few seconds duration difference will cause stalls.

It's definitely desirable to have matching media times as required by all the specs. And I believe it's fine for a player to throw an exception if it's stuck because of that and the only thing that really needs to be taken care of is to avoid stuck states in the player that get never reported.

EXT-X-GAP tags are slightly different because they are a spec-compliant way of signalling a gap and we should aim to handle this case well, no matter how long the gap is. This is tracked by https://github.com/google/ExoPlayer/issues/8959.

This issue here aims to detect these stuck playback cases, but it's not straight-forward to add this logic because we don't have the right information available at the moment and it probably needs a detailed design discussion about all the cases that can happen and how we want to handle them.

stevemayhew commented 1 year ago

Long running live makes this difficult for the origin servers to achieve, but not impossible. In tunneled playback (where platform code performs A/V sync) the issues caused this are far worse, even a few seconds duration difference will cause stalls.

It's definitely desirable to have matching media times as required by all the specs.

It would seem a must, to within whatever threshold will produce a stall. The Apple stream validator reports drift over some threshold (which they don't spec unfortunately) as an error. We most often see this with how different transcoders produced the demuxed audio stream, the segments vary in duration and the PTS offset between the audio segments and the video segments drifts.

And I believe it's fine for a player to throw an exception if it's stuck because of that and the only thing that really needs to be taken care of is to avoid stuck states in the player that get never reported.

We agree, now next step is to add this to the merged "fix" for #7030. The pull request was merged but only with commented out code because we did not agree on this point.

EXT-X-GAP tags are slightly different because they are a spec-compliant way of signalling a gap and we should aim to handle this case well, no matter how long the gap is. This is tracked by #8959.

Yes, let's leave GAP out of this discussion... It has enough issues of its own.

This issue here aims to detect these stuck playback cases, but it's not straight-forward to add this logic because we don't have the right information available at the moment and it probably needs a detailed design discussion about all the cases that can happen and how we want to handle them.

At the container agnostic player level (EPII) I agree. Here the world should be considered pure where SampleStreams report ended, have contiguous timestamps and Renderer's where the isReady() and isEnded() contract is properly implemented.

IMO, the implementers of this contract (the HLS, DASH, etc. libraries and the MediaCodec renderers) should either honor it or throw exceptions (report PlaybackException) when they cannot. The specific issues we have seen that lead to stalls are:

  1. Liberties taken with the vagaries of the HLS spec
  2. Issues with the BRCM tunneling implementation of MediaCodec

Number 2 is a whole separate issue, we have bugs and pull requests for this already. For number 1, we can verify and report with an exception. The advantage of reporting the error up to EPII with recoverable flag is two fold:

  1. The issue can be logged for analytics reporting so it can be properly reported as an Edge/Origin server issue
  2. Recovery may be nearly seamless (in this case generating an internal discontinuity should reset the timestamp adjustment across all SampleStream). At the worst a segment or partial segment may be skipped, but playback will not freeze.

I have not seen the stream from @vovan888 but we did send a sample stream for #7030. With the new (since bug #7030 was submitted at least) PlaybackException logic and codes it would be simple enough to add the recovery logic, we can then test it on both streams.

Let me know and I can start a new pull request with logic from the original (https://github.com/google/ExoPlayer/pull/7034) uncommented perhaps with an experimental flag to enable disable it then we can iterate on an acceptable solution.

For now you could uncomment the logging and test if it triggers against the stream @vovan888 provided, This commit last touched the code: https://github.com/google/ExoPlayer/commit/d487170eb01dd6edfeb65933d9a4f0627039fb3c

Leaving the solution for #7030 half done with commented out code eats at my soul, I'm happy to take the issue and have someone on my team spend time on it, especially if it could solve this problem too.

tonihei commented 1 year ago

which they don't spec unfortunately

They do spec this actually: Durations SHOULD be decimal-floating-point, with enough accuracy to avoid perceptible error when segment durations are accumulated. (from https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis#section-4.4.4.1)

We agree, now next step is to add this to the merged "fix" for https://github.com/google/ExoPlayer/issues/7030. The pull request was merged but only with commented out code because we did not agree on this point.

I slightly lost track of this discussion to be honest since it happened a while back. But it generally makes sense to handle problems automatically if these problems are expected to some extent and can be reliably solved automatically. For example, we have a timestamp discontinuity detection in our audio renderer where we check if the media timestamps and the calculated timestamp (assuming continuous audio) have a mismatch, and if so we just automatically adjust our calculations to match the new media time. IIRC, something similar is attempted in #7030/#7034. If it comes down to something like this, then it makes sense to solve it inside the HLS classes. If it's a more fundamental stream problem, then throwing is probably the better approach to let the app decide how to handle it.

Let me know and I can start a new pull request with logic from the original (https://github.com/google/ExoPlayer/pull/7034) uncommented perhaps with an experimental flag to enable disable it then we can iterate on an acceptable solution.

Generally, that's a great idea and I also think it's worth removing the commented out code (either by making it proper code or removing it altogether). Based on the discussion above, I'm not completely certain yet whether we want to throw or attempt to correct the stream. I think we should settle on something there to ensure the PR is going in the right direction.