Open CarlLindqvist opened 1 year ago
Is this an official test stream from yospace?
I'm asking because I think there is a problem with timestamps. ExoPlayer shows
2023-02-24 11:41:26.203 8934-8988 MediaCodecAudioRenderer androidx.media3.demo.main E Audio sink error
androidx.media3.exoplayer.audio.AudioSink$UnexpectedDiscontinuityException: Unexpected audio track timestamp discontinuity: expected 1000380645333, got 1000284517334
at androidx.media3.exoplayer.audio.DefaultAudioSink.handleBuffer(DefaultAudioSink.java:922)
at androidx.media3.exoplayer.audio.MediaCodecAudioRenderer.processOutputBuffer(MediaCodecAudioRenderer.java:711)
at androidx.media3.exoplayer.mediacodec.MediaCodecRenderer.drainOutputBuffer(MediaCodecRenderer.java:1936)
at androidx.media3.exoplayer.mediacodec.MediaCodecRenderer.render(MediaCodecRenderer.java:798)
at androidx.media3.exoplayer.ExoPlayerImplInternal.doSomeWork(ExoPlayerImplInternal.java:1025)
at androidx.media3.exoplayer.ExoPlayerImplInternal.handleMessage(ExoPlayerImplInternal.java:509)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:223)
at android.os.HandlerThread.run(HandlerThread.java:67)
and playing the same stream with VLC crashed VLC as well indicating a problem with the media:
main error: Timestamp conversion failed for 1677239759060001: no reference clock
main error: Could not convert timestamp 0 for FFmpeg
What player have you been able to play this stream successfully?
Sorry for the delay. We have been working closely with Yospace to create this stream and have been debugging it a lot together with their engineers to figure out if there is anything potentially wrong with it. In what part of the stream do you see the timestamp issue? Is it in the ad breaks or in the "regular content" big buck?
This stream has been tested and running on hls.js
Given the timestamp 1677239759060001, it looks like it is close to entering a break (which is where we also see the problem).
The source is created via AWS MediaLive encoder and converted to HLS via Unified Streaming origin. Unified has also been involved in debugging on the content side.
If there is a content problem, Unified and Yospace are both interested in finding out any potential issues.
Thanks Carl - This is Tony from Yospace, so if any additional information is needed to try to troubleshoot this I'd be happy to try and help, although I don't know what the root cause of this issue is.
On a side-note, I can confirm the stream also runs fine on Shaka 4.3.2 (I had it running for well over an hour without any issues).
If there is a timestamp issue, it would be useful to understand which timestamp is incorrect. I see the message about a potential discontinuity, but it's not clear what that timestamp refers to, or the specifics as to how it was derived - it appears to be a "real time" epoch timestamp, but obviously that didn't come from the MP4 content directly, so understanding how it's calculated with respects to timestamps in the manifest, and content of various MP4 boxes, would be very useful.
Yeah, dunno yet.
The real time timestamp is from VLC which I won't be able to explain. I will look into why ExoPlayer thinks there is a gap in audio timestamps reported with:
Unexpected audio track timestamp discontinuity: 1000380645333, got 1000284517334
This says there is a 10 seconds gap in timestamp for a stream that has ads (of 10 seconds duration?) stitched in somewhere.
I'm not saying this is bad media as I haven't looked into this in detail to be honest. I just saw that ExoPlayer and VLC complain about a similar thing, so I wanted to ask whether this is expected to work before spending more time on it. Sorry, just normal procedure aiming for spending my time well.
We'll look into this and report back.
Much appreciated - as I said, if there's any information you think I can help with, please let me know and I'll do my best.
Carl already gave you details about the content; I can tell you the ad content is encoded to MP4 using FFMPEG and segmented/packaged using Shaka Packager (in a single packaging operation across all rendition levels). We also remove the MEHD box(es) from the converted output segments because of a specific Chromecast issue and insert EMSG boxes in the video segments for metadata purposes.
Thanks Tony
Dennis from Unified Streaming here; @marcbaechinger if there's any way we can assist in piecing the puzzle together, let us know. Thanks for looking into this! 👍🏻
Sorry for not updating this issue. I started looking into this and found that the audio timestamps are off after a while. The player does correct the timestamp at each discontinuity correctly. The issue does not happen at the discontinuities but within the segments of an ad. I'm not yet sure where it comes from.
The symptom first looked to me like there is a discontinuity tag missing. Like when the media suddenly restarts the timestamp, without a makrer in the playlist. LMK if this rings a bell for you.
Beside: I'm not sure how agile the configuration of the test stream is. If it can be changed, I would appreciate to have the first at after 30secs instead as after 5 minutes, this would make debugging a bit easier. :) I already wrote a script to download the media of a given playlist so I can investigate the media when it happens, but I still have to wait 5 minutes until this possibly happens at the first or second ad break. Please disregard if this is not possible.
I found that the big jump in the playback position happens when playback goes back to the first segment that is live content after the inserted ads.
I see the following log statement when extracting and adjusting the timestamps:
03-14 00:08:19.354 29068 29191 D timestamp: desiredSampleTimestampUs - timeUs = timestampOffsetUs: 464560000 - 1678752360000000 = -1678751895440000 -- androidx.media3.common.util.TimestampAdjuster@af1ce42
03-14 00:08:19.354 29068 29191 D mp4-extract: [v] sampleTimeUs: 584560000, currentSamplePresentationTimeUs: 1678752480000000
Later I see that the DefaultAudioSink
complains after this frame has been fed to the decoder:
03-14 00:08:32.627 29068 29183 D hls-pts : queue input: 1000584576000
03-14 00:08:32.677 29068 29183 E MediaCodecAudioRenderer: androidx.media3.exoplayer.audio.AudioSink$UnexpectedDiscontinuityException: Unexpected audio track timestamp discontinuity: expected 1000464738667, got 1000584576000
The difference of the expected and the actual audio timestamp is 1000464738667 - 1000584576000 = -119837333
.
The difference of the timestamp with which the timestamp is adjusted and the first timestamp of the video frames in the fmp4 is 1678752360000000 - 1678752480000000 = −120000000
.
The difference of the timestamp of the first video frame after ads and the timestamp of the first audio frame after ad is 1678752480000000 - 1678752480016000 = 16000
.
So we have 120000000 - 16000 = 119984000
which is pretty close to the difference the DefaultAudiSink
comlains about: 119837333
(delta of 3 millis).
From this data I think there are some faulty timestamps in the media (video?) after the inserted ads. If 119837333
is the duration of the inserted ads, then I think that the timestamps in the live segments following the ads are not offset by the duration of the inserted ads.
This would explain the behaviour we are seeing with the playback position that is derive from the audio timestamp. It also appears to me that besides the current position being wrong, playback works fine.
Changed to bad media
until I'll be educated that I'm wrong. :)
The symptom first looked to me like there is a discontinuity tag missing. Like when the media suddenly restarts the timestamp, without a makrer in the playlist. LMK if this rings a bell for you.
There should be discontinuity tags correctly placed at each discontinuity of the media. The live part of the stream is continous and all timestamps are referenced to epoch (should be close to the burned in clock in the video), it will never repeat timestamps. The ads are all 0 based. Each change to a different video source is marked with a discontinuity tag.
I would appreciate to have the first at after 30secs instead as after 5 minutes
The example media is a continous live stream, with ad breaks that are 120s long that happen every 5 minutes. I can't really make them happen after 30 seconds.
If 119837333 is the duration of the inserted ads, then I think that the timestamps in the live segments following the ads are not offset by the duration of the inserted ads.
I'm not sure I follow. The live media (Ie not ads) can't really adjust any timestamps to anything ad related, since it just keeps going in the background. It has no idea if we replaced content with an ad break or not. That part is done through manifest manipulation from Yospace. I'm not sure what it is supposed to offset.
What context is the timestamp in? It doesn't look like any timestamps from the video, so I'm guessing this is some internal timestamp that is calculated in the player. If it is in microsecond ticks, then 119837333 sounds very close to the 120 second ad break duration.
Maybe I should have included the source which this is stitched on. Here are the underlying manifests: Audio: https://varnish-live-director.a2d-stage.tv/group01/dvrtest2/live.isml/live-audio_0=128000.m3u8 Video: https://varnish-live-director.a2d-stage.tv/group01/dvrtest2/live.isml/live-video=4499968.m3u8
@marcbaechinger what do you mean by the audio time stamps being off? I can't remember seeing any hard resets or the like in our analysis; Do you have an example for "where to where" in time the jumps/resets occur? 🤔
For the interval of ad opportunities in the demo stream, @CarlLindqvist is in control of this.
Ah allright, looking again at the timestamps, some are epoch microseconds. I am a bit confused by the where the expected timestamps are from, like 1000464738667
Thanks for your input.
Dennis: what do you mean by the audio time stamps being off?
That's the resulting audio timestamp after all translations. That's not the audio timestamps as in the media chunks. That's the reason for player.getCurrentPosition()
being totally off after the issue happens (see screenshots posted by Carl in the initial post). Sorry for not being clear here.
Carl: Ah allright, looking again at the timestamps, some are epoch microseconds. I am a bit confused by the where the expected timestamps are from, like 1000464738667
This is padded internally with 1000000000000. So the actual timestamp is 464738667
which is a virtual timestamp also that's not in the media.
Carl: There should be discontinuity tags correctly placed at each discontinuity of the media.
Yeah, timestamps are adjusted for discontinuities (here). Each time when the discontinuity sequence increases we create a new TimestampAdjuster
.
Marc: "faulty timestamps in the media (video?)"
I think I should correct myself to (metadata?)
here.
Looking again I find that the timestamp offset is adjusted with a metadata frame. I suspect there is a metadata frame at the same position as the first video frame (or physically before). It looks like it before the video frame in the container:
76227 03-14 17:33:20.131 1160 1231 D adjust : adjust timestamp from onEmsgLeafAtomRead
76228 03-14 17:33:20.131 1160 1231 D timestamp: desiredSampleTimestampUs - timeUs = timestampOffsetUs: 257680000 - 1678815060000000 = -1678814802320000 -- androidx.media3.common.util.Tim estampAdjuster@6ec1977
76230 03-14 17:33:20.132 1160 1231 D mp4-timestamp: getCurrentSamplePresentationTimeUs: currentSampleIndex=0, currentlyInFragment=true
76232 03-14 17:33:20.132 1160 1231 D mp4-extract: [v] sampleTimeUs: 377680000, currentSamplePresentationTimeUs: 1678815180000000
Interpreting the log lines:
At 17:33:20.131
we extract a emgs metadata frame version 1 (timestamp extracted from atom here)
At 17:33:20.131
the offset of the timestamp adjuster is changed to fit to the new timestamp given by the metadata frame that is that is 1678815060000000
. The timestamp wasn't change by ExoPlayer but used as extracted from the media.
At 17:33:20.132
the current video frame timestamp is read from currentSampleIndex=0
.
At 17:33:20.132
the first video frame of the new segment is extracted (see currentSampleIndex=0). The timestamp of the video frame is 1678815180000000
and is translated to 377680000
which is the value the exception complains about (actually it complains about 377685312
, the timestamp of the corresponding first audio frame in that segment)
androidx.media3.exoplayer.audio.AudioSink$UnexpectedDiscontinuityException: Unexpected audio track timestamp discontinuity : expected 1000257808000, got 1000377685312
The metadata frame has been extracted 1 millisecond before the first video frame of the segment. The difference of the timestamp of that metadata frame and the timestamp of the video frame is
1678815060000000 - 1678815180000000 = -120000000
So it looks like that there is a metadata frame at the end of the ads (I guess it is in the first live segment after the ads).
Can someone let me know if this makes sense and expand a bit on where metadata frames have to be expected in the media, specifically at the boundaries of ad/content or content/ad and what timestamps you would expect for that metadata frame that comes immediately before the first video frame in a new segment after the ads?
Without any former knowledge I wouldn't expect adjacent metadata and video frames to have a timestamp that is 120_000_000 us apart from each other.
Thanks @marcbaechinger
The (video) ad segments do indeed contain EMSG metadata, but these should all fall within the PTS range of the specific segments which contain them. Specifically for a typical ad video segment, there would be 2 or more EMSG metadata boxes:
One occurs shortly after the first video frame (perhaps as little as 50mS afterwards). Another occurs shortly before the end of the video segment (perhaps between 50 and 250mS before the final frame). There may be others in between these, but they all fall between the first and last PTS of the segment.
However if I'm understanding correctly, it looks like the confusion arises with the first content segment following the ads which contain a single EMSG v1 box containing a binary SCTE35 blob (urn:scte:scte35:2013:bin)?
For the example I looked at, the declared PTS value in the EMSG box is: PTS: 1007322876000 (1678871460.0 secs) using 600 timescale, however the first video frame has a declared PTS value later than this EMSG (per tfdt):
tfdt : 20 bytes (Version 1) : Track Fragment Decode Time
BaseMediaDecodeTime: 1007322948000 (0x000000ea892061a0)
1007322948000 - 1007322876000 = 72000, which using 600 timescale is 120.0 seconds (presumably the length of the inserted ad break).
The SCTE appears to be (as you might expect) a "Splice Insert"
[Heavily cropped decode]:
{
splice_command_type: "SpliceInsert",
splice_command: {
splice_event_id: "0x744b0eab",
pts_time: "0x1c1f40620" // Probably not same as content due to PTS rollover?
},
break_duration: {
auto_return: true,
duration: "0xa4cb80" // 10800000 decimal / 90000 = 120.0
}
}
And because this occurs prior to the start of the video, the player sees this as the initial video PTS following the discontinuity and then sees a discrepancy with the initial PTS of the audio track?
In the example I looked at, the first audio PTS was 80585835840255 using 48000 timescale.
This would produce the following timing in the content segments following the ad:
Audio: 80585835840255 / 48000 = 1678871580.0053125 secs, duration = 5.354666667s (without accounting for priming)
Video: 1007322948000 / 600 = 1678871580.0000000 secs, duration = 5.36s
EMSG (in video): 1007322876000 / 600 = 1678871460.0000000 secs with duration 120.0s
I guess the issue here then is the EMSG having a PTS prior to the content which is tripping up the calculation in the player?
Thank you very much for the explanation. Very educational!
I guess the issue here then is the EMSG having a PTS prior to the content which is tripping up the calculation in the player?
Yes indeed, it looks like this metadata frame delivers the very first timestamp in the first content segment following the ads. The player is at this point advised by EXT-X-DISCONTINUITY
to align the following timestamps. The player hence sets the offset according to that first timestamp that is extracted from that segment and all subsequent timestamps are adjusted with that offset.
I wonder whether the metadata frame belongs to the end of the last content segment before the ad.
I'm not really familiar with packaging for live stream to be honest. Is this specific to this stream, a configuration in the tool chain, the packager used or something else? If it would be common, we'd probably know about it.
Could the confusion in the player be because each segment that is part of the scte marked ad break in the stream has EMSG metadata, and we replace all of those segments with ads. But the segments surrounding the break can have remnants of it.
For example, since the very last audio segment before entering an ad break will technically extend a couple of milliseconds into the ad due to audio and video frames not lining up, that segment will have EMSG metadada for the ad in it. And the timestamp of that EMSG will correspond pretty much exactly with the discontinuity marker of the video track.
And on returning from the break, there can be similar issues on the first segment. Remnants of EMSG metadata from the ad we just left.
With many apologies in advance for a long message, and for perhaps stating/repeating the obvious, I just tested another playback run of the stream so I could capture what happens before, during and after the break. What I observed is shown below:
Media Seq | Content ID | First Audio PTS | Last Audio PTS | First Video PTS | Last Video PTS |
---|---|---|---|---|---|
27 |
437252124 |
1679048152.400 |
1679048156.240 |
1679048152.400 |
1679048156.240 |
28 |
437252125 |
1679048156.240 |
1679048160.016 |
1679048156.240 |
1679048160.000 |
DISC | - | - | - | - | - |
29 Note 1 |
first ad |
0.000 |
3.000 |
0.000 |
3.000 |
DISC | etc... | - | - | - | - |
63 Note 1 |
last ad |
0.000 |
3.000 |
0.000 |
3.000 |
DISC | - | - | - | - | - |
64 Note 2 |
437252157 |
1679048280.016 |
1679048282.960 |
1679048280.000 |
1679048282.960 |
65 |
437252158 |
1679048282.960 |
1679048286.800 |
1679048282.960 |
1679048286.800 |
3.0507
secs in duration (2 priming frames) but a SIDX indicating a sub-segment duration of 3.000
secs. Encoded duration excluding priming is 3.008
secs0.00
, 2.00
, 2.75
seconds using v1 EMSG boxes437252157
contains EMSG with PTS 1679048160.000
which is 120.0
secs before the PTS of the first video frameWhat is apparent here is that the EMSG included in the content segment following the break does indeed refer to the PTS timestamp of the discontinuity immediately prior to the break as @CarlLindqvist had suspected. This is causing problems for the player because it essentially references a timestamp outside of the current discontinuity group. When resolving that timestamp in the context of its own timeline the player cannot understand that it falls into an earlier discontinuity period because the discontinuity forces it to forget what it knew about earlier timing information.
A side-note here following on from Carl's comments: He mentions that the content fragments which are "replaced" by advert segments also contain EMSG boxes. These won't affect the player of course as they're never presented in the manifest and therefore never loaded by the player, but for curiosity's sake I looked at them:
The video in fact contains an EMSG v0 box with SCTE data:
emsg : 94 bytes (Version 0) : Event Message
Scheme URI: urn:scte:scte35:2013:bin
Timescale: 600
PTS: 0 (0.0 secs)
Duration: 72000 (0x00011940)
Ident: 584582264
Value:
Payload:
[tfdt not shown here as it's irrelevant]
The audio fragment does contain EMSG this time (the manifested audio fragments do not):
emsg : 98 bytes (Version 1) : Event Message
Scheme URI: urn:scte:scte35:2013:bin
Timescale: 48000
PTS: 80594311680000 (1679048160.0 secs)
Duration: 5760000 (0x0057e400)
Ident: 584582264
Value:
Payload:
tfdt : 20 bytes (Version 1) : Track Fragment Decode Time
BaseMediaDecodeTime: 80594311680768 (0x0000494cd3519300)
Once again, the audio fragment's EMSG is referencing a point in time which falls outside of its PTS window and would be problematic, were the player to encounter it. The video fragment's EMSG however would not be problematic as a v0 EMSG uses a relative timestamp (relative to the start of the fragment in which it's contained).
Q. I'm not sure what the purpose of including these boxes is in the workflow. Not including them would solve the issue, of course. I assume they're needed? Q. I'm aware that v1 EMSG are generally considered to be the version required by CMAF, however if it's proving problematic to time these inside the correct discontinuity group, is there an option to use v0 EMSG instead (with a 0 PTS value, effectively). This way, they wouldn't cause problems inside the player (although perhaps some players won't present them to the client as they're not v1 - is this a problem?)
I do have some additional questions/concerns directed at Carl though about the timing information I posted above:
With regards and thanks to all involved here
Q. I'm not sure what the purpose of including these boxes is in the workflow. Not including them would solve the issue, of course. I assume they're needed?
They are not needed for this particular workflow. They are included automatically by the Unified Streaming origin, and can't be toggled off. There might exist other workflows out there where this is used, i don't know.
Removed the media link due to abuse. Can I add a hidden comment somehow?
ExoPlayer Version
2.18.3
Devices that reproduce the issue
The problem doesn't seem to be device specific. Happens on devices and on emulator.
Devices that do not reproduce the issue
No response
Reproducible in the demo app?
Yes
Reproduction steps
Play the included live stream and watch it across a couple of ad breaks.
This stream will have a 2 minute ad break every 5 minutes. (:01, :06, :11, :16 ... :56)
Expected result
The player should play normally across the ad breaks.
Actual result
Sometimes the ads will play OK. But it will inevitably end up in a broken state upon entering an ad break sooner or later (usually the second ad break). When this happens, the timeline seems broken. The player will attempt to seek to time that is not part of the current buffer, a time in the future.
Example image: When the problem happens, the timeline looks like on the left. The player is at position 3:17 when it only has 01:38 worth of content. When in a regular state, it will look like the picture on the right.
Media
REMOVED
Bug Report
adb bugreport
to dev.exoplayer@gmail.com after filing this issue.