video-dev / hls.js

HLS.js is a JavaScript library that plays HLS in browsers with support for MSE.
https://hlsjs.video-dev.org/demo
Other
14.47k stars 2.55k forks source link

EXT-X-DATERANGE metadata synchronisation vs video stream in presence of frequent EXT-X-DISCONTINUITYs #6203

Closed lnstadrum closed 4 months ago

lnstadrum commented 5 months ago

What version of Hls.js are you using?

1.5.4

What browser (including version) are you using?

Chromium 120.0.6099.71 (Official Build)

What OS (including version) are you using?

Linux Mint 21.2 Victoria (64-bit)

Test stream

https://hlsjs.video-dev.org/demo/?src=https%3A%2F%2Fbonksound.studio%2Fhls%2Fplaylist.m3u8&demoConfig=eyJlbmFibGVTdHJlYW1pbmciOnRydWUsImF1dG9SZWNvdmVyRXJyb3IiOnRydWUsInN0b3BPblN0YWxsIjpmYWxzZSwiZHVtcGZNUDQiOmZhbHNlLCJsZXZlbENhcHBpbmciOi0xLCJsaW1pdE1ldHJpY3MiOi0xfQ==

Configuration

{
  debug: false,
  maxBufferLength: 300
}

Additional player setup steps

This is not a playback issue: please do not expect any errors on the demo stream page.

The issue is related to the metadata track vs video track synchronization. To observe it we need (1) a playlist constructed in a special way (details follow and the test sample is provided), and (2) a little tooling, consisting in setting up a cuechange listener as follows:

// vanilla HLS instance setup
const video = document.getElementById("video")
const hls = new Hls()

hls.attachMedia(video)

hls.on(Hls.Events.MEDIA_ATTACHED, () => {

    // We are going to listen to cue changes and print something to the browser console.
    video.textTracks.addEventListener("addtrack", (event) => {

        event.track.addEventListener("cuechange", () => {
            // Grab the cue ID, which is (arbitrarily) built from
            // a useful part and a random suffix with a dot in-between.
            // The random suffix is only needed to ensure the uniqueness
            // of the IDs as required by the HLS specification.
            const id = video.textTracks[0].activeCues[0].id

            // Display the 'useful part' of the ID in the browser console:
            // the test stream is constructed in a way that the displayed text
            // should match the number shown in the video frame.
            console.log(id.split(".")[0])
        })

    })

    hls.loadSource("playlist.m3u8")

})

Additional details

(Apologies for the verbosity of what follows.)

The test stream consists of several pieces of content, of a few MPEG-TS fragments each. There is a timestamp discontinuity between the subsequent pieces, so EXT-X-DISCONTINUITY tag is inserted in the playlist.

For test purposes,

Despite its synthetic appearance, this test stream actually comes from real data we work on. We replaced its content but we kept timestamps and stream durations unchanged.

In our application, we need to be able to identify which piece is being played when the user interacts with other elements in the page. As discussed here, there are several possibilities to solve that.

For this reason we try to push this approach a bit further. After every discontinuity and its associated EXT-X-PROGRAM-DATE-TIME we put a EXT-X-DATERANGE tag, to make a metadata record appearing in a textTrack. The date-range record has its start time matching to the PDT, a duration roughly matching the length of a particular piece, and an ID attribute carrying the piece ID. So our playlist is built of repeated sections as follows:

#EXT-X-DISCONTINUITY
#EXT-X-PROGRAM-DATE-TIME:2024-02-05T00:00:04.004000
#EXT-X-DATERANGE:ID=video_piece_id,START-DATE=2024-02-05T00:00:04.004000,DURATION=6.006,X-CUE=" "
#EXTINF:...
...
#EXTINF:...
...

We then can use the common browser API to listen to cuechange events on that textTrack to identify which video piece is being played. Since this is not specific to HLS.js, in theory we can expect this to work with the native HLS implementation in iOS as well. As far as EXT-X-PROGRAM-DATE-TIME is correctly inferred, and EXT-X-DATERANGE have the same start time as the PDT tags, we should be able to get the textTrack in sync with the video despite all the discontinuities.

The piece of JavaScript above allows to listen to these events and display the video piece ID in the browser console. We can then check whether the obtained ID matches the actual content being played.

So the issue is that those cuechange events actually come out of sync.

A few final notes

Checklist

Steps to reproduce

  1. Attach a cuechange event listener as described in the additional player setup steps.
  2. Open the JavaScript browser console.
  3. Start playback. Please be patient and do not seek through the video.
  4. Watch for the messages printed in the console: an integer number (the cue ID) from 1 to 7 is displayed every few seconds.
  5. When the number changes, look at what is displayed on the screen at that very moment (also a number from 1 to 7).

An instrumented demo is available here.

Expected behaviour

The number printed in the browser console matches the number in the video frame, e.g., the text metadata track is in-sync with the video track.

What actually happened?

All good at the beginning, but after ~1 min there is a noticeable delay between the number in the console and the number in the video frame, of about a second (the latter is delayed with respect to the former). It does not keep increasing indefinitely though, and seems to be related to maxBufferLength, i.e., increasing the latter makes the delay worse.

Console output

1
2
3
4
5
6
7
1
2
...

Chrome media internals output

No response

robwalch commented 5 months ago

In HLS.js, DATERANGE tags are mapped to cues on the TextTrack timeline using playlist time (EXTINF durations). The drift occurs when media parsed and the parsed media duration differs from the duration in the playlist without the total program duration matching up once summed up. This is the case in your sample. The demo page "Timeline" tab shows the parsed segment duration slightly over 1.02s each vs the #EXTINF:1.001000 found in the playlist. This is not a result of DISCONTINUITY tags, but the playlist segment durations each being less than the corresponding parsed segment durations. Generally, HLS.js determines parsed segment duration as the difference between the starting video timestamp of a segment to the starting timestamp of the next.

You can access DateRange data directly in HLS.js using hls.levels[hls.currentLevel].details?.dateRanges. This is a map (Object) of all parsed DateRanges by ID. It's a much more complete and up-to-date collection of logical DateRanges - valid tags with the same ID are merged, and all attributes are available on the object. In v1.6 (with #6213) DateRanges will have a "tag anchor" that reference their adjacent fragment in the playlist so that their start time is always mapped to the PDT and discontinuity domain at that segment position on the playback timeline. The LEVEL_PTS_UPDATED event signals that segment times were updated based on parsed media timestamps and can be used to update app logic.

A fix for this metadata TextTrack specific issue still requires cue timing to be updated after media is parsed (on LEVEL_PTS_UPDATED) in the id3-track-controller. We can look into a fix for this in the next release. Even after the update I would recommend using the aforementioned LevelDetails dateRanges. Using cues in necessary is Safari HLS playback, but produces a cue for every attribute, is missing ID, and does not merge DateRange tags with the same ID.

robwalch commented 5 months ago

Marking as enhancement. This is not a regression.

DateRange TextTrack cues are mapped to playlist time not the video track or parsed media, as of v1.5.x. When the playlist times differ this much from the parsed media (on every segment, not just discontinuity) I think it is fair to consider this a stream issue. To support Interstitials in v1.6, precise mapping of DateRanges to the playback timeline is crucial, so we can afford to compensate for these kind of discrepancies, and explore sliding cue start times (if we cannot adjust them after creating then we'll need to remove and add cues) at a usable interval.

robwalch commented 5 months ago

@lnstadrum,

Using cues in necessary in Safari HLS playback, but produces a cue for every attribute, is missing ID, and does not merge DateRange tags with the same ID.

FYI - The DateRanges in your playlist are invalid and while this should result in them being ignored, Apple HLS clients error and will not play the sample provided. ID and date attributes are expected to be provided as quoted-string values (...bonksound.studio/hls/playlist.m3u8 is missing quotes around ID and START-DATE values and errors rather than plays in Safari and Apple HLS clients).

Ex:

EXT-X-DATERANGE:ID=4.16016,START-DATE=2024-02-05T00:00:16.016000,DURATION=4.004,X-CUE=" "

should be:

EXT-X-DATERANGE:ID="4.16016",START-DATE="2024-02-05T00:00:16.016000",DURATION=4.004,X-CUE=" "

HLS.js is not strict about quoted-string format attribute values missing quotes. In #6213 I've added some validation logic that logs warning in to the console when missing quotes are encountered.

lnstadrum commented 5 months ago

Hi @robwalch,

Thank you for looking into this.

robwalch commented 5 months ago

@Instadrum,

I just added 0c5ca21 to #6213 which updates cue start and end times on PTS update. Let me know if this works for you:

https://feature-date-range-parsing.hls-js-4zn.pages.dev/demo/?src=https%3A%2F%2Fbonksound.studio%2Fhls%2Fplaylist.m3u8

Just confirmed that Safari does not shift cues to align with media this way, so you'll want to make sure your Playlist EXTINF durations align with presentation time in Apple HLS clients (or relative to Safari's HTMLMediaElement.getStartDate()). Providing unmuxed ISO BMFF segmented HLS Playlists may help as then your main playlist has only video segments and there is no start offset when segmented based on audio time or whatever track starts or ends first or last.

I don't think it hurts for HLS.js to make these adjustments as it may push the timeline out based on audio priming delays, overlaps not allowed in MSE, and other presentation time oddities the library has picked up on the way. This change will ensure we keep cues aligned with the libraries interface for DateRange and Program-Date-Time mapping and hls.playingDate.

lnstadrum commented 4 months ago

Hi @robwalch,

Thanks you, the test branch works as expected on our test stream.

We will indeed have to find a solution for Safari. There actually is a silent audio track in our test stream (as we do have audio in our streams in production), and audio and video tracks do not end at the same moment prior to discontinuity, or do not start at the same moment right after, due to the way our compiled stream is built.

I guess this issue can now be closed. Thanks a lot again for your help!