google / ExoPlayer

This project is deprecated and stale. The latest ExoPlayer code is available in https://github.com/androidx/media
https://developer.android.com/media/media3/exoplayer
Apache License 2.0
21.7k stars 6.02k forks source link

Gapless audio playback on multi-period DASH source #4899

Closed ghexoplayerquestion closed 5 years ago

ghexoplayerquestion commented 6 years ago

Issue description

I am playing a custom created MPEG-DASH manifest that includes multiple periods. Each period contains a single FMP4 segment that was created from ffmpeg using the following command line: ffmpeg -i - -f segment -segment_attclocktime 1 -strftime 1 -c:a libfdk_aac -b:a 32k -segment_format mp4 -segment_format_options movflags=empty_moov+default_base_moof+frag_keyframe ~/test/%FT%H-%M-%S%z.mp4 Each period contains a single segment, and uses the period duration along with the presentationTimeOffset to trim the first and last sample off of the period. The segments, and thus the period durations vary, which is why each segment is in its own period.

When playing back, there is an audible gap between each period. Because the period start-times are configured without a gap, I would expect the audio playback to be gapless.

Reproduction steps

A reproduction app is available at: https://github.com/ghexoplayerquestion/Repro

with the relevant code in the Android activity at: https://github.com/ghexoplayerquestion/Repro/blob/master/app/src/main/java/com/example/ghexoplayerquestion/repro/MainActivity.java

During playback, the following is output on the debug console:

I/ExoPlayerImpl: Init 42b7916 [ExoPlayerLib/2.9.0] [generic_x86, Android SDK built for x86, Google, 28]
I/Choreographer: Skipped 48 frames!  The application may be doing too much work on its main thread.
I/OpenGLRenderer: Davey! duration=827ms; Flags=0, IntendedVsync=69241345581930, Vsync=69242145581898, OldestInputEvent=9223372036854775807, NewestInputEvent=0, HandleInputStart=69242153074100, AnimationStart=69242153958800, PerformTraversalsStart=69242156035700, DrawStart=69242157118400, SyncQueued=69242158809400, SyncStart=69242159321100, IssueDrawCommandsStart=69242159614500, SwapBuffers=69242164402600, FrameCompleted=69242173244900, DequeueBufferDuration=1560000, QueueBufferDuration=3441000, 
W/VideoCapabilities: Unrecognized profile 4 for video/hevc
I/VideoCapabilities: Unsupported profile 4 for video/mp4v-es
D/NetworkSecurityConfig: No Network Security Config specified, using platform default
I/OMXClient: IOmx service obtained
I/ACodec: codec does not support config priority (err -2147483648)
I/OMXClient: IOmx service obtained
I/ACodec: codec does not support config priority (err -2147483648)
I/OMXClient: IOmx service obtained
I/ACodec: codec does not support config priority (err -2147483648)
I/OMXClient: IOmx service obtained
I/ACodec: codec does not support config priority (err -2147483648)
I/OMXClient: IOmx service obtained
I/ACodec: codec does not support config priority (err -2147483648)
I/OMXClient: IOmx service obtained
I/ACodec: codec does not support config priority (err -2147483648)
W/AudioTrack: getTimestamp() location moved from kernel to server
D/AudioTrack: stop() called with 587776 frames delivered

Link to test content

The DASH manifest that reproduces this issue is:

<MPD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" profiles="urn:mpeg:dash:profile:isoff-main:2011" mediaPresentationDuration="PT0M11.508982S" minBufferTime="PT6S" xmlns="urn:mpeg:dash:schema:mpd:2011">
  <Period start="PT0S" duration="PT2.017435S">
    <AdaptationSet mimeType="audio/mp4" codecs="mp4a.40.2" contentType="audio">
      <Representation audioSamplingRate="44100" id="1" bandwidth="32000">
        <BaseURL>https://rhhhloggermediastore.blob.core.windows.net/rh-logger-share/1b4966c2-26f2-403a-8808-67b4d5488c8c.mp4</BaseURL>
        <SegmentBase timescale="1000000" presentationTimeOffset="21333" />
      </Representation>
    </AdaptationSet>
  </Period>
  <Period start="PT2.017435S" duration="PT1.809909S">
    <AdaptationSet mimeType="audio/mp4" codecs="mp4a.40.2" contentType="audio">
      <Representation audioSamplingRate="44100" id="1" bandwidth="32000">
        <BaseURL>https://rhhhloggermediastore.blob.core.windows.net/rh-logger-share/f0d7b23b-878c-4be4-9832-6cf8539c6331.mp4</BaseURL>
        <SegmentBase timescale="1000000" presentationTimeOffset="21333" />
      </Representation>
    </AdaptationSet>
  </Period>
  <Period start="PT3.827344S" duration="PT1.842343S">
    <AdaptationSet mimeType="audio/mp4" codecs="mp4a.40.2" contentType="audio">
      <Representation audioSamplingRate="44100" id="1" bandwidth="32000">
        <BaseURL>https://rhhhloggermediastore.blob.core.windows.net/rh-logger-share/a17539ae-8064-42a0-bd10-7d04cd2b2886.mp4</BaseURL>
        <SegmentBase timescale="1000000" presentationTimeOffset="21333" />
      </Representation>
    </AdaptationSet>
  </Period>
  <Period start="PT5.669687S" duration="PT2.102876S">
    <AdaptationSet mimeType="audio/mp4" codecs="mp4a.40.2" contentType="audio">
      <Representation audioSamplingRate="44100" id="1" bandwidth="32000">
        <BaseURL>https://rhhhloggermediastore.blob.core.windows.net/rh-logger-share/dbb6292a-b0fd-4036-86fc-eb8f2f2bff6c.mp4</BaseURL>
        <SegmentBase timescale="1000000" presentationTimeOffset="21333" />
      </Representation>
    </AdaptationSet>
  </Period>
  <Period start="PT7.772563S" duration="PT1.952418S">
    <AdaptationSet mimeType="audio/mp4" codecs="mp4a.40.2" contentType="audio">
      <Representation audioSamplingRate="44100" id="1" bandwidth="32000">
        <BaseURL>https://rhhhloggermediastore.blob.core.windows.net/rh-logger-share/21190726-7468-4c5d-893c-fe30f642788d.mp4</BaseURL>
        <SegmentBase timescale="1000000" presentationTimeOffset="21333" />
      </Representation>
    </AdaptationSet>
  </Period>
  <Period start="PT9.724981S" duration="PT1.784001S">
    <AdaptationSet mimeType="audio/mp4" codecs="mp4a.40.2" contentType="audio">
      <Representation audioSamplingRate="44100" id="1" bandwidth="32000">
        <BaseURL>https://rhhhloggermediastore.blob.core.windows.net/rh-logger-share/c662bb42-97ac-4eb4-b699-52a97932fd38.mp4</BaseURL>
        <SegmentBase timescale="1000000" presentationTimeOffset="21333" />
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Version of ExoPlayer being used

ExoPlayer version 2.9.0

Device(s) and version(s) of Android being used

Reproduces on Android emulator: Nexus 5X, 5.2 1080x1920 xxhdpi Android API 28 x86

A full bug report captured from the device

The bug report is attached. bugreport.zip

ojw28 commented 5 years ago

Thanks for the interesting manifest. Things we found:

  1. There's a bug in the buffered position reported by the player (during playback you may observe the buffering position show that all 5 periods are buffered, then incorrectly snap back to the period boundaries during playback). This is unrelated to what you're actually asking about, but we'll fix it :).
  2. We're not clipping the samples that end up with negative timestamps in this case, or the ones that end up extending beyond the period duration. We'll fix this too and it makes playback a bit better, but I can still here some slight discontinuities.

A few observations about the manifest itself:

  1. It doesn't really matter (and we'll still handle this case), but the DASH specification discourages what you're doing: "Media Segments should not contain any presentation time that is smaller than the value of the @presentationTimeOffset". I believe the outcome of recent DASH-IF discussions was that this should be allowed, but specifically in the case where each representation is a segment consisting of multiple sub-segments (and a segment index that's accessible from the manifest).
  2. Trying to clip with sample accuracy via the manifest seems very error prone. As an example of why, your sample uses different timescales in the manifest (1000000) and media (1000). In the media the second sample has timestamp 21000, but the clipping specified by the manifest clips at 21333. We end up clipping two samples as a result, rather than one.
  3. The durations of the periods in the manifest don't appear to correspond to clipping one sample from the end of each period.
  4. Even when I manually edit the manifest to clip exactly one sample from the start and end of each period, there is still a slight audible discontinuity. It's unclear whether the content has been prepared in a way that allows for true gapless playback.

At a higher level, it's unclear what you're trying to achieve. There's no need to start a new period to accommodate different segment durations. Periods are typically for use when the content actually changes (e.g. transition from one song to another, or from content to an ad). I also don't understand why you're having to clip from the start and end of each segment. Why are they there in the first place? It's pretty common to have variable segment length AAC in DASH and I've never seen this being necessary before, so there's likely a shortcoming with how you're preparing the media.

So TLDR - We'll fix the issues identified at the top. It'll make things a bit better, but there will still be an audible discontinuity. I think the real fix is to prepare the content in a better way.

ghexoplayerquestion commented 5 years ago

Thank you ojw28 for looking at this and the detailed response. I will continue to look at the encoding to handle this better.

I was trimming the beginning and end of the segments to account for the encoder start-up delay (https://www2.iis.fraunhofer.de/AAC/gapless.html) but it looks like I was not calculating the trim values correctly. I'll also look at using the MPEG Edit List to trim the segments and then including the segments in a single Period with a SegmentList.

In addition (not shown in this repro), I am also using the presentationTimeOffset of the first segment and Period duration of the last segment / MPD mediaPresentationDuration to allow the manifest to trim the beginning and end of the entire presentation to bounds that fall inside of segments. Is that affected by the issue (2) you brought up, and if so is there a better way to handle this?

ghexoplayerquestion commented 5 years ago

Re: my last paragraph - I see if I get my media presentation to a single period I can use the ClippingMediaSource. I wasn't using ClippingMediaSource because I was using the multiple periods to trim the segments.

ojw28 commented 5 years ago

I still don't really understand what you're doing. Pretty much any DASH stream will segment AAC audio into multiple segments, and they wont ever put each segment in its own period or do anything special to deal with start-up delay and padding.