w3c / webmediaporting

Web Media porting spec
1 stars 10 forks source link

accuracy of reporting the current playback position for the media element #9

Open mavgit opened 7 years ago

mavgit commented 7 years ago

From @jpiesing on March 7, 2017 15:15

Web video providers have found it necessary to make requirements on the accuracy of the value returned by HTMLMediaElement.currentTime. Should we? If so then what?

The definition of currentTime from HTML 5.1 is;

The currentTime attribute must, on getting, return the media element’s default playback start position, unless that is zero, in which case it must return the element’s official playback position. The returned value must be expressed in seconds.

The definition of official playback position is as follows;

Media elements also have an official playback position, which must initially be set to zero seconds. The official playback position is an approximation of the current playback position that is kept stable while scripts are running.

The definition of current playback position is as follows;

Media elements have a current playback position, which must initially (i.e., in the absence of media data) be zero seconds. The current playback position is a time on the media timeline.

The following is also relevant;

When a media element is potentially playing and its Document is a fully active Document, its current playback position must increase monotonically at effective playback rate units of media time per unit time of the media timeline’s clock. (This specification always refers to this as an increase, but that increase could actually be a decrease if the effective playback rate is negative.)

Nothing here specifies how accurately it must be reported. It could be 100ms, 250ms, 500ms, 1s.

Should WAVE say something about this?

Copied from original issue: w3c/webmediaapi#42

mavgit commented 7 years ago

From @jpiesing on May 16, 2017 7:58

Here is a straw-man proposal as requested;

The value of HTMLMediaElement.currentTime shall be accurate to within 250ms as measured at the point in the media pipeline where video and graphics are composited (i.e. after the video decoder and any video-specific picture processing but before any generic picture processing and before HDMI).

mavgit commented 7 years ago

From @IDWMaster on June 12, 2017 20:11

What about something like this:

The value returned by currentTime shall be the current time, in seconds of the frame that is currently displayed on the screen, as determined by the presentation timestamp from the underlying transport stream. HTMLMediaElement.currentTime shall be accurate enough to unambiguously reference a particular frame as specified by its PTS.

mavgit commented 7 years ago

From @jpiesing on June 15, 2017 9:18

What about something like this:

The value returned by currentTime shall be the current time, in seconds of the frame that is currently displayed on the screen, as determined by the presentation timestamp from the underlying transport stream. HTMLMediaElement.currentTime shall be accurate enough to unambiguously reference a particular frame as specified by its PTS.

Unfortunately that's impossible for the UA to determine in some cases and may be harder to test - at least in a way that can be automated.

A UA in something connected to a display over HDMI can't really know the delay in the display at the far end of the cable. Even in devices with an integrated display, TV sets (at least) have picture processing and improvement logic between the video/graphics system and the panel. The delay through this may be hard to know for certain. Of course the device has to estimate the delay through that logic in order to maintain sync between video, audio and captions/subtitles but that estimate may not meet the target of "to unambiguously reference a particular frame as specified by its PTS."

The tests that I've seen involve special video streams with timecode burnt into the video and a test HTML page writes the value currentTime to the screen. The two can then be compared. This tests a subtly different definition of currentTime - not including the time taken after video and graphics have been composited. This sort of test is easy to automate - just point a camera at the screen and check that the two values match offline.

If you want to test the frame that's being displayed then I guess the test HTML page has to write the value of currentTime to some kind of debug output and adjust for any delay in the time taken for that. Somehow that value would then have to be compared with the value in the video. Another possibility might be to put single frames of white in an otherwise black video. A sensor could be used to record when the white appears and match that against the value of currentTime.

The point I'm trying to make is that we need to be confident that whatever we specify can be practically tested in the range of devices that should be able to support WAVE.

jpiesing commented 6 years ago

See https://github.com/whatwg/html/issues/3041