w3c / webvtt

WebVTT Standard
https://w3c.github.io/webvtt/
Other
104 stars 40 forks source link

Data model definition of WebVTT cue #416

Open nigelmegitt opened 6 years ago

nigelmegitt commented 6 years ago

Under Data Model, Section 3.1 defines:

A WebVTT cue is a text track cue that additionally consist of the following: [HTML51]

However from a data model perspective this is not correct. A suitable text track cue can be constructed from a WebVTT cue but their data models are not identical. Specifically the HTML definition of a text track cue includes:

However neither of these are part of the data model of WebVTT, though values for them can in principle be derived.

It would be more correct to list all the data model contents of a WebVTT cue, highlighting for each if it is identical to part of the data model of text track cue, i.e.:

If the WebVTT data is chapter data, the text track cue Rules for extracting the chapter title algorithm is [whatever it is defined to be in WebVTT].

When constructing a text track cue from a WebVTT cue the pause-on-exit flag is always set to false.

silviapfeiffer commented 6 years ago

WebVTT does not remove the pause-on-exit flag from a WebVTT cue, nor does it remove the chapter title extraction algorithm. The word "additionally" is really important here. It's what makes it a derivative of a text track cue and defines what the HTML5 spec calls "additional format-specific data". Why would we remove any of the text track cue functionality from a WebVTT cue? Why would we repeat the definitions of identifier, begin time, end time? I don't see the need.

The pause-on-exit flag is being set in the WebVTT parser algorithm: "Let cue’s text track cue pause-on-exit flag be false". That's because we haven't defined anything for the audio description use case in the WebVTT spec yet - that flag has been prepared for that use case. Also, the "rules for extracting the chapter title" are defined in https://www.w3.org/TR/webvtt1/#rules-for-extracting-the-chapter-title .

It all seems to be in order from where I stand. The data model is quite clear that it describes the "additional format-specific data".

nigelmegitt commented 6 years ago

The trouble is that they aren't exactly the same data entities even if they conceptually represent the same thing.

For example the text track cue start time is:

The time, in seconds and fractions of a second, that describes the beginning of the range of the media data to which the cue applies.

and that can be negative.

But the WebVTT cue start time is a time in (optionally) hours, minutes, seconds and milliseconds that fulfils the same purpose, and it can only be positive. The same is true for end times.

There's a mapping, but they aren't the same entities. The WebVTT cue identifier is not an arbitrary string either, as permitted in text track cues, since it has additional constraints.

silviapfeiffer commented 6 years ago

Media data time on a media element in HTML cannot be negative, so the text track cue start time cannot be negative either - I don't follow.

Also, WebVTT cue start time is a double, see https://www.w3.org/TR/webvtt1/#the-vttcue-interface . It's exactly what the text track cue start time is defined as: time in seconds and fractions of a second.

What you are referring to is WebVTT cue timing (https://www.w3.org/TR/webvtt1/#webvtt-cue-timings) which is the text that is used to specify the start time and it gets parsed into seconds and fractions of a second. There's a difference between what is held in a parsed object and what is specified as text.

The reason I know that WebVTT cues are a derivative of text track cues is that originally, when it was all in the HTML spec, there was no difference. We just ripped text track cues out of WebVTT cues to allow them to be more generic.

nigelmegitt commented 6 years ago

From Note in https://www.w3.org/TR/html51/semantics-embedded-content.html#cue :

The text track cue start time and text track cue end time can be negative. [snip]

silviapfeiffer commented 6 years ago

Ah ok. I think that just means that you can set the start time negative (it's a long, not an unsigned long, just like it is in WebVTT also), but it also says that it has no effect on a video timeline which is always positive. That doesn't change my explanation. Cue timing in a WebVTT file is being parsed into this start and end time of a text track cue. It's therefore the same kind of object, just with some extended data, i.e. it's a derivative.

silviapfeiffer commented 6 years ago

See also #415