v4.0.0 shaka player cannot display webvtt caption for HLS(either fmp4 or mpegts), DASH is fine

liuyang5832 commented 2 years ago

Have you read the FAQ and checked for duplicate open issues? yes

What link can we use to reproduce this? https://shaka-player-demo.appspot.com/demo/#audiolang=en-US;textlang=en-US;uilang=en-US;asset=https://storage.googleapis.com/livestream-demo-output/miltonliu-webvtt-shaka-4-0-0-test/manifest.m3u8;panel=CUSTOM%20CONTENT;build=uncompiled

What version of Shaka Player are you using? v4.0.0-uncompiled

What browser and OS are you using? Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36

What did you do? simply playback a generated hls manifest with v4.0.0 Shaka player and failed to see the caption display, it used to be good with v3.x.x version, and I tried with v3.3.2 version and it's still good.

link to v3.3.2 version that displays the same content well: https://v3-3-2-dot-shaka-player-demo.appspot.com/demo/#audiolang=en-US;textlang=en-US;uilang=en-US;asset=https://storage.googleapis.com/livestream-demo-output/miltonliu-webvtt-shaka-4-0-0-test/manifest_ts.m3u8;panel=CUSTOM%20CONTENT;build=uncompiled

What did you expect to happen? webvtt caption should be displayed for HLS upon selecting

What actually happened? no caption display

joeyparrish commented 2 years ago

This may be related to X-TIMESTAMP-MAP and the use of sequence mode for the audio/video content. In this WebVTT content, I see:

WEBVTT
X-TIMESTAMP-MAP=LOCAL:01:00:00.000,MPEGTS:324000000

02:08:06.923 --> 02:08:07.157
- Not at all?

VTT timestamps at 1 hour map to main content at 324000000 / 90k = 3600.00000 = 1 hour. So there is no relative offset.

However, the media timestamps are ignored in sequence mode. The first audio segment, for example, has an internal timestamp of 7686.952, or 2:08:06.952. This would align with the first subtitle, except that due to sequence mode, the first audio segment appears in the presentation timeline at ~0 instead.

Since we are not extracting timestamps from media, and X-TIMESTAMP-MAP relies on media timestamps, this system is broken.

joeyparrish commented 2 years ago

If we could perfectly emulate sequence mode for text, then the first text segment would appear at time 0, without regard for the timestamps in it. However, we don't know when a text segment "starts" from its contents. The segment could cover a 10-second period of time, but only have a cue appear at time 5. Or it could be completely empty. So the distance from the conceptual start of a text segment and the start of the first cue cannot be known from the contents of the text. (Unlike with audio and video segments, where there are no periods of time without samples.) Trying to offset the text timestamps back to 0 to align with audio & video won't work without additional information.

We could go back to extracting timestamps from media for HLS, but avoid the latency hit we took for this in v3. Instead, we could wait until the first segment is fetched anyway. We could still use sequence mode, but extract the timestamp of the very first segment we fetch. The difference between that timestamp and the startTime of that segment's SegmentReference could be used to align text segments.

The biggest problems with this are the complexity of format parsing and timestamp extraction, and support for containerless or packed audio streams, which don't have internal timestamps at all. (Though we could argue that X-TIMESTAMP-MAP only works with video or audio in an MP4 or TS container, and say anyone with a weird WebVTT+audio-only HLS stream just needs to align their subtitles to 0.)

It would be nice if we could get away with forcing the platform to extract timestamps for us. I don't know if this would work, but if we could dynamically set sequence mode on SourceBuffers, then we could always do something like this for the very first segment, without complicated parsers and without high startup latency:

If first segment:
1. Set segment mode
2. Append the first segment
3. Check buffered to see what its timestamp was
4. Clear the buffer
Set sequence mode
Set timestamp offset
Append the segment

joeyparrish commented 2 years ago

Looks like the trick to change modes works on desktop Chrome. Now to test it on all of our other supported platforms in the lab.

joeyparrish commented 2 years ago

Works on all other platforms, except Tizen 2 & 3, which don't support sequence mode at all, and are already excluded from our new HLS parser.

There are some tests which need updating, but the fix seems good.

shaka-project / shaka-player

v4.0.0 shaka player cannot display webvtt caption for HLS(either fmp4 or mpegts), DASH is fine #4191