google / ExoPlayer

This project is deprecated and stale. The latest ExoPlayer code is available in https://github.com/androidx/media
https://developer.android.com/media/media3/exoplayer
Apache License 2.0
21.7k stars 6.01k forks source link

Bug on entering replaced ads HLS fmp4 #11012

Open CarlLindqvist opened 1 year ago

CarlLindqvist commented 1 year ago

ExoPlayer Version

2.18.3

Devices that reproduce the issue

The problem doesn't seem to be device specific. Happens on devices and on emulator.

Devices that do not reproduce the issue

No response

Reproducible in the demo app?

Yes

Reproduction steps

Play the included live stream and watch it across a couple of ad breaks.

This stream will have a 2 minute ad break every 5 minutes. (:01, :06, :11, :16 ... :56)

Expected result

The player should play normally across the ad breaks.

Actual result

Sometimes the ads will play OK. But it will inevitably end up in a broken state upon entering an ad break sooner or later (usually the second ad break). When this happens, the timeline seems broken. The player will attempt to seek to time that is not part of the current buffer, a time in the future.

Example image: image When the problem happens, the timeline looks like on the left. The player is at position 3:17 when it only has 01:38 worth of content. When in a regular state, it will look like the picture on the right.

Media

REMOVED

Bug Report

marcbaechinger commented 1 year ago

Is this an official test stream from yospace?

I'm asking because I think there is a problem with timestamps. ExoPlayer shows

2023-02-24 11:41:26.203  8934-8988  MediaCodecAudioRenderer androidx.media3.demo.main            E  Audio sink error
androidx.media3.exoplayer.audio.AudioSink$UnexpectedDiscontinuityException: Unexpected audio track timestamp discontinuity: expected 1000380645333, got 1000284517334
  at androidx.media3.exoplayer.audio.DefaultAudioSink.handleBuffer(DefaultAudioSink.java:922)
  at androidx.media3.exoplayer.audio.MediaCodecAudioRenderer.processOutputBuffer(MediaCodecAudioRenderer.java:711)
  at androidx.media3.exoplayer.mediacodec.MediaCodecRenderer.drainOutputBuffer(MediaCodecRenderer.java:1936)
  at androidx.media3.exoplayer.mediacodec.MediaCodecRenderer.render(MediaCodecRenderer.java:798)
  at androidx.media3.exoplayer.ExoPlayerImplInternal.doSomeWork(ExoPlayerImplInternal.java:1025)
  at androidx.media3.exoplayer.ExoPlayerImplInternal.handleMessage(ExoPlayerImplInternal.java:509)
  at android.os.Handler.dispatchMessage(Handler.java:102)
  at android.os.Looper.loop(Looper.java:223)
  at android.os.HandlerThread.run(HandlerThread.java:67)

and playing the same stream with VLC crashed VLC as well indicating a problem with the media:

main error: Timestamp conversion failed for 1677239759060001: no reference clock
main error: Could not convert timestamp 0 for FFmpeg

What player have you been able to play this stream successfully?

CarlLindqvist commented 1 year ago

Sorry for the delay. We have been working closely with Yospace to create this stream and have been debugging it a lot together with their engineers to figure out if there is anything potentially wrong with it. In what part of the stream do you see the timestamp issue? Is it in the ad breaks or in the "regular content" big buck?

This stream has been tested and running on hls.js

CarlLindqvist commented 1 year ago

Given the timestamp 1677239759060001, it looks like it is close to entering a break (which is where we also see the problem).

The source is created via AWS MediaLive encoder and converted to HLS via Unified Streaming origin. Unified has also been involved in debugging on the content side.

If there is a content problem, Unified and Yospace are both interested in finding out any potential issues.

YoTonyF commented 1 year ago

Thanks Carl - This is Tony from Yospace, so if any additional information is needed to try to troubleshoot this I'd be happy to try and help, although I don't know what the root cause of this issue is.

On a side-note, I can confirm the stream also runs fine on Shaka 4.3.2 (I had it running for well over an hour without any issues).

If there is a timestamp issue, it would be useful to understand which timestamp is incorrect. I see the message about a potential discontinuity, but it's not clear what that timestamp refers to, or the specifics as to how it was derived - it appears to be a "real time" epoch timestamp, but obviously that didn't come from the MP4 content directly, so understanding how it's calculated with respects to timestamps in the manifest, and content of various MP4 boxes, would be very useful.

marcbaechinger commented 1 year ago

Yeah, dunno yet.

The real time timestamp is from VLC which I won't be able to explain. I will look into why ExoPlayer thinks there is a gap in audio timestamps reported with:

Unexpected audio track timestamp discontinuity: 1000380645333, got 1000284517334

This says there is a 10 seconds gap in timestamp for a stream that has ads (of 10 seconds duration?) stitched in somewhere.

I'm not saying this is bad media as I haven't looked into this in detail to be honest. I just saw that ExoPlayer and VLC complain about a similar thing, so I wanted to ask whether this is expected to work before spending more time on it. Sorry, just normal procedure aiming for spending my time well.

We'll look into this and report back.

YoTonyF commented 1 year ago

Much appreciated - as I said, if there's any information you think I can help with, please let me know and I'll do my best.

Carl already gave you details about the content; I can tell you the ad content is encoded to MP4 using FFMPEG and segmented/packaged using Shaka Packager (in a single packaging operation across all rendition levels). We also remove the MEHD box(es) from the converted output segments because of a specific Chromecast issue and insert EMSG boxes in the video segments for metadata purposes.

Thanks Tony

dyazdan commented 1 year ago

Dennis from Unified Streaming here; @marcbaechinger if there's any way we can assist in piecing the puzzle together, let us know. Thanks for looking into this! 👍🏻

marcbaechinger commented 1 year ago

Sorry for not updating this issue. I started looking into this and found that the audio timestamps are off after a while. The player does correct the timestamp at each discontinuity correctly. The issue does not happen at the discontinuities but within the segments of an ad. I'm not yet sure where it comes from.

The symptom first looked to me like there is a discontinuity tag missing. Like when the media suddenly restarts the timestamp, without a makrer in the playlist. LMK if this rings a bell for you.

Beside: I'm not sure how agile the configuration of the test stream is. If it can be changed, I would appreciate to have the first at after 30secs instead as after 5 minutes, this would make debugging a bit easier. :) I already wrote a script to download the media of a given playlist so I can investigate the media when it happens, but I still have to wait 5 minutes until this possibly happens at the first or second ad break. Please disregard if this is not possible.

marcbaechinger commented 1 year ago

I found that the big jump in the playback position happens when playback goes back to the first segment that is live content after the inserted ads.

I see the following log statement when extracting and adjusting the timestamps:

03-14 00:08:19.354 29068 29191 D timestamp: desiredSampleTimestampUs - timeUs = timestampOffsetUs: 464560000 - 1678752360000000 = -1678751895440000 -- androidx.media3.common.util.TimestampAdjuster@af1ce42
03-14 00:08:19.354 29068 29191 D mp4-extract: [v] sampleTimeUs: 584560000, currentSamplePresentationTimeUs: 1678752480000000

Later I see that the DefaultAudioSink complains after this frame has been fed to the decoder:

03-14 00:08:32.627 29068 29183 D hls-pts : queue input: 1000584576000
03-14 00:08:32.677 29068 29183 E MediaCodecAudioRenderer:   androidx.media3.exoplayer.audio.AudioSink$UnexpectedDiscontinuityException: Unexpected audio track timestamp discontinuity: expected 1000464738667, got 1000584576000

The difference of the expected and the actual audio timestamp is 1000464738667 - 1000584576000 = -119837333.

The difference of the timestamp with which the timestamp is adjusted and the first timestamp of the video frames in the fmp4 is 1678752360000000 - 1678752480000000 = −120000000.

The difference of the timestamp of the first video frame after ads and the timestamp of the first audio frame after ad is 1678752480000000 - 1678752480016000 = 16000.

So we have 120000000 - 16000 = 119984000 which is pretty close to the difference the DefaultAudiSink comlains about: 119837333 (delta of 3 millis).

From this data I think there are some faulty timestamps in the media (video?) after the inserted ads. If 119837333 is the duration of the inserted ads, then I think that the timestamps in the live segments following the ads are not offset by the duration of the inserted ads.

This would explain the behaviour we are seeing with the playback position that is derive from the audio timestamp. It also appears to me that besides the current position being wrong, playback works fine.

marcbaechinger commented 1 year ago

Changed to bad media until I'll be educated that I'm wrong. :)

CarlLindqvist commented 1 year ago

The symptom first looked to me like there is a discontinuity tag missing. Like when the media suddenly restarts the timestamp, without a makrer in the playlist. LMK if this rings a bell for you.

There should be discontinuity tags correctly placed at each discontinuity of the media. The live part of the stream is continous and all timestamps are referenced to epoch (should be close to the burned in clock in the video), it will never repeat timestamps. The ads are all 0 based. Each change to a different video source is marked with a discontinuity tag.

I would appreciate to have the first at after 30secs instead as after 5 minutes

The example media is a continous live stream, with ad breaks that are 120s long that happen every 5 minutes. I can't really make them happen after 30 seconds.

If 119837333 is the duration of the inserted ads, then I think that the timestamps in the live segments following the ads are not offset by the duration of the inserted ads.

I'm not sure I follow. The live media (Ie not ads) can't really adjust any timestamps to anything ad related, since it just keeps going in the background. It has no idea if we replaced content with an ad break or not. That part is done through manifest manipulation from Yospace. I'm not sure what it is supposed to offset.

What context is the timestamp in? It doesn't look like any timestamps from the video, so I'm guessing this is some internal timestamp that is calculated in the player. If it is in microsecond ticks, then 119837333 sounds very close to the 120 second ad break duration.

Maybe I should have included the source which this is stitched on. Here are the underlying manifests: Audio: https://varnish-live-director.a2d-stage.tv/group01/dvrtest2/live.isml/live-audio_0=128000.m3u8 Video: https://varnish-live-director.a2d-stage.tv/group01/dvrtest2/live.isml/live-video=4499968.m3u8

dyazdan commented 1 year ago

@marcbaechinger what do you mean by the audio time stamps being off? I can't remember seeing any hard resets or the like in our analysis; Do you have an example for "where to where" in time the jumps/resets occur? 🤔

For the interval of ad opportunities in the demo stream, @CarlLindqvist is in control of this.

CarlLindqvist commented 1 year ago

Ah allright, looking again at the timestamps, some are epoch microseconds. I am a bit confused by the where the expected timestamps are from, like 1000464738667

marcbaechinger commented 1 year ago

Thanks for your input.

Dennis: what do you mean by the audio time stamps being off?

That's the resulting audio timestamp after all translations. That's not the audio timestamps as in the media chunks. That's the reason for player.getCurrentPosition() being totally off after the issue happens (see screenshots posted by Carl in the initial post). Sorry for not being clear here.

Carl: Ah allright, looking again at the timestamps, some are epoch microseconds. I am a bit confused by the where the expected timestamps are from, like 1000464738667

This is padded internally with 1000000000000. So the actual timestamp is 464738667 which is a virtual timestamp also that's not in the media.

Carl: There should be discontinuity tags correctly placed at each discontinuity of the media.

Yeah, timestamps are adjusted for discontinuities (here). Each time when the discontinuity sequence increases we create a new TimestampAdjuster.

Marc: "faulty timestamps in the media (video?)"

I think I should correct myself to (metadata?) here.

Looking again I find that the timestamp offset is adjusted with a metadata frame. I suspect there is a metadata frame at the same position as the first video frame (or physically before). It looks like it before the video frame in the container:

76227 03-14 17:33:20.131  1160  1231 D adjust  : adjust timestamp from onEmsgLeafAtomRead
76228 03-14 17:33:20.131  1160  1231 D timestamp: desiredSampleTimestampUs - timeUs = timestampOffsetUs: 257680000 - 1678815060000000 = -1678814802320000 -- androidx.media3.common.util.Tim      estampAdjuster@6ec1977
76230 03-14 17:33:20.132  1160  1231 D mp4-timestamp: getCurrentSamplePresentationTimeUs: currentSampleIndex=0, currentlyInFragment=true
76232 03-14 17:33:20.132  1160  1231 D mp4-extract: [v] sampleTimeUs: 377680000, currentSamplePresentationTimeUs: 1678815180000000

Interpreting the log lines:

androidx.media3.exoplayer.audio.AudioSink$UnexpectedDiscontinuityException: Unexpected audio track timestamp discontinuity      : expected 1000257808000, got 1000377685312

The metadata frame has been extracted 1 millisecond before the first video frame of the segment. The difference of the timestamp of that metadata frame and the timestamp of the video frame is

1678815060000000 - 1678815180000000  = -120000000

So it looks like that there is a metadata frame at the end of the ads (I guess it is in the first live segment after the ads).

Can someone let me know if this makes sense and expand a bit on where metadata frames have to be expected in the media, specifically at the boundaries of ad/content or content/ad and what timestamps you would expect for that metadata frame that comes immediately before the first video frame in a new segment after the ads?

Without any former knowledge I wouldn't expect adjacent metadata and video frames to have a timestamp that is 120_000_000 us apart from each other.

YoTonyF commented 1 year ago

Thanks @marcbaechinger

The (video) ad segments do indeed contain EMSG metadata, but these should all fall within the PTS range of the specific segments which contain them. Specifically for a typical ad video segment, there would be 2 or more EMSG metadata boxes:

One occurs shortly after the first video frame (perhaps as little as 50mS afterwards). Another occurs shortly before the end of the video segment (perhaps between 50 and 250mS before the final frame). There may be others in between these, but they all fall between the first and last PTS of the segment.

However if I'm understanding correctly, it looks like the confusion arises with the first content segment following the ads which contain a single EMSG v1 box containing a binary SCTE35 blob (urn:scte:scte35:2013:bin)?

For the example I looked at, the declared PTS value in the EMSG box is: PTS: 1007322876000 (1678871460.0 secs) using 600 timescale, however the first video frame has a declared PTS value later than this EMSG (per tfdt):

tfdt : 20 bytes (Version 1) : Track Fragment Decode Time
  BaseMediaDecodeTime: 1007322948000 (0x000000ea892061a0)

1007322948000 - 1007322876000 = 72000, which using 600 timescale is 120.0 seconds (presumably the length of the inserted ad break).

The SCTE appears to be (as you might expect) a "Splice Insert"

[Heavily cropped decode]:
{
  splice_command_type: "SpliceInsert",
  splice_command: {
    splice_event_id: "0x744b0eab",
    pts_time: "0x1c1f40620" // Probably not same as content due to PTS rollover?
  },
  break_duration: {
    auto_return: true,
    duration: "0xa4cb80" // 10800000 decimal / 90000 = 120.0
  }
}

And because this occurs prior to the start of the video, the player sees this as the initial video PTS following the discontinuity and then sees a discrepancy with the initial PTS of the audio track?

In the example I looked at, the first audio PTS was 80585835840255 using 48000 timescale.

This would produce the following timing in the content segments following the ad:

Audio: 80585835840255 / 48000 = 1678871580.0053125 secs, duration = 5.354666667s (without accounting for priming)
Video: 1007322948000 / 600 = 1678871580.0000000 secs, duration = 5.36s
EMSG (in video): 1007322876000 / 600 = 1678871460.0000000 secs with duration 120.0s

I guess the issue here then is the EMSG having a PTS prior to the content which is tripping up the calculation in the player?

marcbaechinger commented 1 year ago

Thank you very much for the explanation. Very educational!

I guess the issue here then is the EMSG having a PTS prior to the content which is tripping up the calculation in the player?

Yes indeed, it looks like this metadata frame delivers the very first timestamp in the first content segment following the ads. The player is at this point advised by EXT-X-DISCONTINUITY to align the following timestamps. The player hence sets the offset according to that first timestamp that is extracted from that segment and all subsequent timestamps are adjusted with that offset.

I wonder whether the metadata frame belongs to the end of the last content segment before the ad.

I'm not really familiar with packaging for live stream to be honest. Is this specific to this stream, a configuration in the tool chain, the packager used or something else? If it would be common, we'd probably know about it.

CarlLindqvist commented 1 year ago

Could the confusion in the player be because each segment that is part of the scte marked ad break in the stream has EMSG metadata, and we replace all of those segments with ads. But the segments surrounding the break can have remnants of it.

For example, since the very last audio segment before entering an ad break will technically extend a couple of milliseconds into the ad due to audio and video frames not lining up, that segment will have EMSG metadada for the ad in it. And the timestamp of that EMSG will correspond pretty much exactly with the discontinuity marker of the video track.

And on returning from the break, there can be similar issues on the first segment. Remnants of EMSG metadata from the ad we just left.

YoTonyF commented 1 year ago

With many apologies in advance for a long message, and for perhaps stating/repeating the obvious, I just tested another playback run of the stream so I could capture what happens before, during and after the break. What I observed is shown below:

Media Seq Content ID First Audio PTS Last Audio PTS First Video PTS Last Video PTS
27 437252124 1679048152.400 1679048156.240 1679048152.400 1679048156.240
28 437252125 1679048156.240 1679048160.016 1679048156.240 1679048160.000
DISC - - - - -
29 Note 1 first ad 0.000 3.000 0.000 3.000
DISC etc... - - - -
63 Note 1 last ad 0.000 3.000 0.000 3.000
DISC - - - - -
64 Note 2 437252157 1679048280.016 1679048282.960 1679048280.000 1679048282.960
65 437252158 1679048282.960 1679048286.800 1679048282.960 1679048286.800

What is apparent here is that the EMSG included in the content segment following the break does indeed refer to the PTS timestamp of the discontinuity immediately prior to the break as @CarlLindqvist had suspected. This is causing problems for the player because it essentially references a timestamp outside of the current discontinuity group. When resolving that timestamp in the context of its own timeline the player cannot understand that it falls into an earlier discontinuity period because the discontinuity forces it to forget what it knew about earlier timing information.

A side-note here following on from Carl's comments: He mentions that the content fragments which are "replaced" by advert segments also contain EMSG boxes. These won't affect the player of course as they're never presented in the manifest and therefore never loaded by the player, but for curiosity's sake I looked at them:

The video in fact contains an EMSG v0 box with SCTE data:

emsg : 94 bytes (Version 0) : Event Message
  Scheme URI: urn:scte:scte35:2013:bin
  Timescale: 600
  PTS: 0 (0.0 secs)
  Duration: 72000 (0x00011940)
  Ident: 584582264
  Value: 
  Payload:

[tfdt not shown here as it's irrelevant]

The audio fragment does contain EMSG this time (the manifested audio fragments do not):

emsg : 98 bytes (Version 1) : Event Message
  Scheme URI: urn:scte:scte35:2013:bin
  Timescale: 48000
  PTS: 80594311680000 (1679048160.0 secs)
  Duration: 5760000 (0x0057e400)
  Ident: 584582264
  Value: 
  Payload:

    tfdt : 20 bytes (Version 1) : Track Fragment Decode Time
      BaseMediaDecodeTime: 80594311680768 (0x0000494cd3519300)

Once again, the audio fragment's EMSG is referencing a point in time which falls outside of its PTS window and would be problematic, were the player to encounter it. The video fragment's EMSG however would not be problematic as a v0 EMSG uses a relative timestamp (relative to the start of the fragment in which it's contained).

Q. I'm not sure what the purpose of including these boxes is in the workflow. Not including them would solve the issue, of course. I assume they're needed? Q. I'm aware that v1 EMSG are generally considered to be the version required by CMAF, however if it's proving problematic to time these inside the correct discontinuity group, is there an option to use v0 EMSG instead (with a 0 PTS value, effectively). This way, they wouldn't cause problems inside the player (although perhaps some players won't present them to the client as they're not v1 - is this a problem?)

I do have some additional questions/concerns directed at Carl though about the timing information I posted above:

  1. Could ExoPlayer be made (perhaps via an option) to ignore any EMSG boxes which fall outside of the current PTS window (and not use their timestamps to "deduce" the initial PTS - instead relying on the BaseMediaDecodeTime value in the TFDT as the hard threshold, or the EarliestPTS value in the SIDX, where this is included?) EMSG falling prior to the initial timestamp could then be considered "implausible" and either re-timed to occur at the base timestamp, or dropped entirely? I appreciate that ExoPlayer needs to cover many more use cases than this, but is that even a viable consideration?
  2. Prior to the break, there is "too much" audio (by 16mSecs) presented in the content. I assume this additional audio data is discarded by the player as it falls outside the timing of the presented video data? Or does the ad audio run consecutively, pushing everything out-of-sync?
  3. Related to the above, does the player observe the subsegment duration contained in the ad fragment's SIDX box when determining duration? I.e. would the ad audio be decoded with duration of 3.0s, 3.008s or 3.0507s? (The manifested duration is 3.00s)
  4. For each of the ads, these include priming samples in their first segment. These are indicated via a number of boxes (SGPD "roll", EDTS/ELST in the init fragment and SBGP "roll" in the data fragment). Is the player aware of how to handle these?
  5. Related to the above, what does the player do when this information is missing (such as in the content)? How does it "prime" the audio decoder for each discontinuity?
  6. Looking at the content following the break, the audio starts 16mSecs later than the video. How does the player handle this gap?

With regards and thanks to all involved here

CarlLindqvist commented 1 year ago

Q. I'm not sure what the purpose of including these boxes is in the workflow. Not including them would solve the issue, of course. I assume they're needed?

They are not needed for this particular workflow. They are included automatically by the Unified Streaming origin, and can't be toggled off. There might exist other workflows out there where this is used, i don't know.

CarlLindqvist commented 1 year ago

Removed the media link due to abuse. Can I add a hidden comment somehow?