w3c / webvtt.js

WebVTT parser and validator
https://w3c.github.io/webvtt.js/parser.html
Creative Commons Zero v1.0 Universal
146 stars 49 forks source link

[Enhancement] RFC8216 - support X-TIMESTAMP-MAP #38

Open bbgdzxng1 opened 2 years ago

bbgdzxng1 commented 2 years ago

The HLS RFC8216 https://datatracker.ietf.org/doc/html/rfc8216#section-3.5 has extended the webVTT spec to support X-TIMESTAMP-MAP.

In order to synchronize timestamps between audio/video and subtitles, an X-TIMESTAMP-MAP metadata header SHOULD be added to each WebVTT header. This header maps WebVTT cue timestamps to MPEG-2 (PES) timestamps in other Renditions of the Variant Stream. Its format is:

X-TIMESTAMP-MAP=LOCAL:,MPEGTS: e.g., X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

Although X-TIMESTAMP-MAP is not part of the w3c spec, X-TIMESTAMP-MAP is now part of an official RFC.

It would be useful if webvtt.js supported this header without generating error, so that HLS segmented webVTTs can be validated using webvtt.js.

Test case using https://quuz.org/webvtt/

Current behavior "Line 2: No blank line after the signature." Expected behavior "This is boring, your WebVTT is valid! (1ms)"

WEBVTT - This file has cues. ; Kind: captions; Language: en
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:11.000 --> 00:13.000 vertical:rl
<v Roger Bingham>We are in New York City

00:13.000 --> 00:16.000
<v Roger Bingham>We're actually at the Lucern Hotel, just down the street

00:16.000 --> 00:18.000
<v Roger Bingham>from the American Museum of Natural History

00:18.000 --> 00:20.000
<v Roger Bingham>And with me is Neil deGrasse Tyson

00:20.000 --> 00:22.000
<v Roger Bingham>Astrophysicist, Director of the Hayden Planetarium

Thanks!

Frenzie commented 2 years ago

A simple workaround could look something like this, but presumably merely pretending it's valid is not acceptable as a PR:

diff --git a/parser.js b/parser.js
index fa80072..ff97920 100644
--- a/parser.js
+++ b/parser.js
@@ -67,6 +67,10 @@

       /* HEADER */
       while(lines[linePos] != "" && lines[linePos] != undefined) {
+        if(lines[linePos].indexOf("X-TIMESTAMP-MAP") != -1) {
+          linePos++
+          continue
+        }
         err("No blank line after the signature.")
         if(lines[linePos].indexOf("-->") != -1) {
           alreadyCollected = true

There seems to already be a timestamp() function which could probably be adapted.

silviapfeiffer commented 2 years ago

Hmm, this is not strictly conformant with the spec: https://www.w3.org/TR/webvtt1/#file-structure It would have been better to make the X-TIMESTAMP-MAP a separate block. Given this, you'll have to take this up with the Timed Text WG at the W3C and get the WebVTT spec changed to accept extra lines under the "WEBVTT" header string.

dontcallmedom commented 2 years ago

closing until/unless the spec is updated to make this valid

bbgdzxng1 commented 1 year ago

Thanks @Frenzie, @silviapfeiffer and @dontcallmedom. The challenge is that there is a bit of a spec fight between W3C and RFC.

WebVTT team have deprecated header support. Roger Pantos over at Apple did not get the memo, so it has ended up in the final RFC for HLS, but not in W3C.

Here's Roger Pantos' (The Grandfather of HTTP Live Streaming) take: https://mailarchive.ietf.org/arch/msg/hls-interest/4vmLpEsV-EnmkEwMQZkzbGQai_4/

From a WebVTT support in HLS perspective, Pantos takes the view that "At this point there’s probably more VTT content containing X-TIMESTAMP-MAP out there in the world than not, so we have to continue to support that syntax.", so it has remained in the RFC.

Would webvtt.js consider an "Warning (or Info): Headers are not standard in the official W3C webVTT spec, but are acceptable in HTTP Live Streaming in accordance with RFC8216"?

The compromise may satisfy both camps (W3C and HLS RFC) and allow end-users to validate their files without loglevel of error. One of the major use-cases of webVTT out there is HTTP Live Streaming, and RFC is a genuine RFC.

Thanks for listening.