Open cookiecrook opened 1 year ago
@eric-carlson @chrisn @gkatsev
FWIW, there is a somewhat common metadata format that uses URLs and not JSON for describing preview thumbnails when hovering over the progress bar:
WEBVTT
00:00:00.000 --> 00:00:00.979
https://cdn.example.com/thumbnails.jpg#xywh=0,0,284,160
00:00:00.979 --> 00:00:01.959
https://cdn.example.com/thumbnails.jpg#xywh=284,0,284,160
00:00:01.959 --> 00:00:02.938
https://cdn.example.com/thumbnails.jpg#xywh=568,0,284,160
Any leading URI protocol regex seems easy enough to incorporate in addition to JSON.
It seems difficult to introduce stricter constraints on the format without risking breaking someone's application. I note that the HTML spec includes a VTT example with a non-JSON custom metadata format. I have no idea how commonly used such formats may be used in practice, though. So if we do make a change to the VTT spec, that example would also need to be updated.
Have you considered other options for how schema information could be signaled? One idea: the VTT parsing algorithm requires implementations to ignore everything following WEBTT
on the first line of the file. Conceivably, a backwards-compatible change could be to extended this line to include an optional schema identifier which would be ignored by today's implementations.
As @chrisn mentioned, introducing stricter constraints won't work, as it'll break folks. As you mentioned, the kind is available if the VTT file is included an HTML file which describes its kind
, it's been briefly talked about having this be in the WebVTT file itself but so far nothing has materialized.
Some related discussions: https://github.com/w3c/webvtt/issues/259, https://github.com/w3c/webvtt/issues/346 and https://github.com/w3c/webvtt/issues/485.
Those discussions involve WebVTT re-adding headers as an extension point to facilitate HLS's X-TIMESTAMP-MAP and webm's metadata. This header could be used to display the type of VTT file this is.
@chrisn's suggest of using the WEBVTT
line is similar to using headers. Personally, I don't think it's good enough because we want to definitively remove ambiguity, so, using an optional extension for a core part of the spec seems fraught.
I think we probably want to have some other formal backwards-compatible signal that's in-file to indicate the kind
of VTT file is being represented. Perhaps a new METADATA
block, a la REGION
and STYLE
? It could start with a single property kind
, that is equivalent to HTML Text Track kind.
My only concern with such an addition is the potential appetite for implementing WebVTT additions by folks.
Mentioned in TTWG April 27 Minutes.
Resolved to keep this open and explore @gkatsev's idea:
Perhaps a new
METADATA
block, a laREGION
andSTYLE
? It could start with a single propertykind
, that is equivalent to HTML Text Track kind.
Resolved to keep this open and explore @gkatsev's idea:
Perhaps a new METADATA block, a la REGION and STYLE? It could start with a single property kind, that is equivalent to HTML Text Track kind.
@eric-carlson and I would like to make some progress on this idea, defining potential properties with #512 as the initial use case. Is anyone else interested in meeting about this in a breakout session at the Sept 2023 TPAC in Sevilla?
Here's what we're thinking: A new ATTRIBUTES
block. Renamed from @gkatsev's original proposal to avoid the redundancy in the case of METADATA kind: metadata
.
WEBVTT
ATTRIBUTES
kind: subtitles
srclang: es-mx
label: Español
NOTE
Standard subtitles (unlike CC or SDH captions) typically
translate spoken dialog or signage, but not audible sounds
effects like "dogs barking."
1
00:00:10.123 --> 00:00:15.432
¡Hola! ¿Qué tál?
WEBVTT
ATTRIBUTES
kind: captions
srclang: es-mx
label: Español (SDH)
NOTE
Captions (SDH aka Subtitles for the Deaf and Hard-of-Hearing)
typically include spoken dialog as well as important audible
sounds such as "floor boards creak", "dogs barking", or in
this case, "music".
1
00:00:10.123 --> 00:00:15.432
¡Hola! ¿Qué tál?
2
00:00:47.462 --> 00:01:04.028
[♫ música ♫]
WEBVTT
ATTRIBUTES
kind: descriptions
srclang: en-us
label: English (AD)
NOTE
VTT-based descriptions are meant to render as text-to-speech audio or braille,
for blind or deafblind audiences, not usually as visual captions on screen.
As such, the option/label might be displayed in an audio menu or elsewhere.
1
00:00:10.123 --> 00:00:15.432
A young girl tiptoes down a dark hallway.
type
attribute)WEBVTT
ATTRIBUTES
kind: metadata
type: video-thumbnails
NOTE
In order to support accessibilty, the simple URL-only thumbnail
format mentioned above should be updated to include "alt" text for
each. In the potential format below, I've written that as a JSON
block containing alt strings for multiple supported languages.
00:00:01.959 --> 00:00:02.938
{
"src": "https://cdn.example.com/thumbnails.jpg#xywh=0,0,284,160",
"alt": {
"en-us": "Miguel crosses the marigold bridge to the land of the dead.",
"es-mx": "Miguel cruza el puente marigold hacia la tierra de los muertos."
}
}
WEBVTT
ATTRIBUTES
kind: metadata
type: video-flash-avoidance
NOTE
Spec for "video-flash-avoidance" (or "video-flash", "strobing", etc.) type would define
usage as a JSON block with one required and two optional key/value pairs:
- integer "intensity": 0-100
- opt token "flash-type": ["general-flash" (default) | "red-flash" | "spatial-pattern"]
- opt token "algorithm": ["undefined" (default) | "harding" | "apple-vfr" (bikeshed, algo needs name)]
NOTE
The v1 Apple open-sourced algorithm (bikeshed name "apple-vfr" for "video
flashing reduction") only detects "general-flash" patterns (not yet
"red-flash" or "spacial-pattern"), but we think it performs better than the
de facto Harding test in those instances of "general-flash". See below for
example where "harding" would still need to be used to denote the
"spatial-pattern" cue that the open-sourced algorithm doesn't yet account for.
Cite: https://developer.apple.com/accessibility/#dim-flashing-lights
1
00:00:10.123 --> 00:00:15.432
{
"intensity": "75",
"flash-type": "general-flash",
"algorithm": "apple-vfr"
}
2
00:00:47.462 --> 00:01:04.028
{
"intensity": "100",
"flash-type": "spatial-pattern",
"algorithm": "harding"
}
Is anyone else interested in meeting about this in a breakout session at the Sept 2023 TPAC in Sevilla?
Better yet, time on the standard TTWG meeting schedule for TPAC.
Note: I looked for a F2FCandidate
or TPACCandidate
keyword. Not sure how you're tracking that list.
Is anyone else interested in meeting about this in a breakout session at the Sept 2023 TPAC in Sevilla?
I'm interested, yes.
@nigelmegitt et al, can we get time on the TTWG schedule at TPAC? I think that's a better forum than a breakout session. Also would be good to coordinate with @jasonjgw and others interested in MAUR Issue 2
I haven't seen a published schedule, but most of Tuesday afternoon (Sept 12, CET) is still open for me.
Currently scheduled for 14:30 CET on Tuesday Sept 12th. Thanks Nigel.
WEBVTT
ATTRIBUTES kind: metadata type: video-flash-avoidance
@cookiecrook Some questions regarding your proposal of a new type
attribute in the proposed attribute
block:
TextTrack
: do you suggest also adding type
as an attribute to TextTrack
?type
be defined in a controlled vocabulary/registry? In a registry, the specific values could be linked to the specification that defines the content format of a specific type (e.g. the spec for "video-flash-avoidance").The Timed Text Working Group just discussed VTT Metadata Cue format is ambiguous; some metadata may be unintentionally presented to the user in a context outside HTML w3c/webvtt#511
, and agreed to the following:
SUMMARY: Strong support for this new ATTRIBUTE block but we probably don't want this to hold up the current version of WebVTT from progressing to Rec
@silviapfeiffer Making sure you saw this since we missed you at TPAC. If you have any feedback, please share. Also, I plan to work with @eric-carlson on a VTT PR soon, unless you'd prefer to author it.
PR is ready for review.
VTT Metadata Cue format is ambiguous; some metadata may be unintentionally presented to the user in a context outside HTML.
Consider clarifying that metadata cues SHOULD or MUST be formatted as one or more unambiguous patterns. JSON is the obvious one, to retain backwards compatibility with the JSON usage documented in the VTT spec, but there may be others.
Background
§ 4.2.1. WebVTT metadata text (Normative) defines metadata text as:
§ 1.7. Metadata example (Informative) clarifies:
Problem
"Metadata can be any string" results in a format that is ambiguous, and therefore may be presented to the user unintentionally.
In an HTML
<video>
element, this ambiguity is resolved by the author providing akind="metadata"
attribute on the text track.But there isn't a logical place to duplicate this disambiguation in some other VTT contexts, including when they are embedded in some media container formats.
Proposed Solution
Consider clarifying that metadata cues SHOULD or MUST be formatted as one or more unambiguous patterns. JSON is the obvious one, to retain backwards compatibility with the JSON usage documented in the VTT spec, but there may be others.
Additional context for this change in the following issue.
512