w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
971 stars 136 forks source link

change units of timestamp or rename to timestampMicros #122

Closed dogben closed 3 years ago

dogben commented 3 years ago

Nearly every existing web API dealing with timestamps uses DOMTimeStamp or DOMHighResTimeStamp, both of which are defined to be in units of milliseconds. However, the timestamp attribute found throughout the WebCodecs spec has units of microseconds. This is quite confusing and has resulted in several bugs in my use of WebCodecs.

I suggest either changing the units of timestamp to DOMHighResTimeStamp or renaming the attribute timestampMicros to make it clear that it is different from other timestamps.

I'll also point out that HTMLMediaElement.currentTime is in units of seconds. The timestamp attribute of WebCodecs is analogous to the currentTime of a media element, so using seconds as the unit could also be an option.

sandersdan commented 3 years ago

There was some discussion about this in #52, and the issue is under current discussion. I'll post updates here.

chcunningham commented 3 years ago

After discussion w/ Dan, we lean toward DOMHighResTimeStamp. The current choice of integer microseconds was made mostly because thats what we used internally and we had vague worries about floating point losing precision (maybe manifesting as av sync issues or audio glitches down the road). Dan did some analysis to find that we get ~285 years of microseconds before that starts to manifest though. @sandersdan - can you give us the details of that calc?

@padenot FYI

sandersdan commented 3 years ago

FYI there is a distinction between DOMTimeStamp and DOMHighResTimeStamp (thanks @dogben, I hadn't noticed DOMTimeStamp!):

We are not using the document epoch, so we should prefer DOMTimeStamp. A different argument could be made for capture timestamps, though.

The rough calculation is that a double has 53 bits of precision, and 2^53 microseconds is 285 years. Thus we can expect 285 years of at least microsecond precision from whatever epoch is being used, which is usually 0 at the start of a media file, but could commonly be the Unix epoch (1970) for live media cases.

If microsecond precision is required in a guaranteed-rounding sense, I'd half that, so one file could be 142 years long with microsecond precision or a Unix timestamped stream could cover 1970--2112 (also earlier using negative timestamps).

For comparison, our long long microseconds approach covers about 300,000 years at exactly microsecond precision, but in JS it's represented as a double and as a result has similar precision limits to DOMTimeStamp.

dogben commented 3 years ago

DOMTimeStamp seems to be an integral type.

sandersdan commented 3 years ago

Oops, you're right. Hmm. Might need to talk to the hr-time folks to see if our use is acceptable without sharing an epoch.

sandersdan commented 3 years ago

Based on https://github.com/w3c/hr-time/issues/104 it does not appear there is any concern with using DOMHighResTimeStamp without an explicit epoch.

padenot commented 3 years ago

What about audio alignment and rounding? Experience shows that using floating point for time when doing audio work cannot really work. In the general case, people are expected to integrate over buffer durations in sample-frames.

Generally, media APIs deal with integer time because it solves a whole class rounding bugs, but this is js, is there anything we can do ?

sandersdan commented 3 years ago

The best idea I have for these cases to to build a generic metadata API, so that apps can tag their own metadata into chunks and frames. Then apps can store timestamps in whatever format they want.

You can also just store integers in a DOMHighResTimeStamp, but doing so may interfere with rate control algorithms in encoders. Hopefully we can specify that in a bits-per-frame rather than bits-per-second way to avoid this limitation.

padenot commented 3 years ago

Or we can leave everything as integer microseconds.

sandersdan commented 3 years ago

When timestamp < 275 years, DOMHighResTimeStamp has higher precision than integer microseconds. We can remain interoperable with other web specs and apps that want exact microseconds can use micros = Math.round(1000 * frame.timestamp).

padenot commented 3 years ago

Yes, this is the annoying part. For audio, this is the kind of design that will cause bugs, because people will use the timestamp instead of building a clock with the buffer sizes, and won't round properly. But it's not like we can remove it either.

sandersdan commented 3 years ago

Makes sense, it can be hard to teach this.

My position is that microseconds are just as arbitrary as float milliseconds. If we could have rational seconds (timestamp + timebase), that would be superior for media, but I didn't find a way to do that that was ergonomic enough to justify.

I doubt a note in the spec that recommends counting audio in samples would be sufficient, but it could be a start.

Maybe we could convince the world to use a timebase that is a power of two, then floats are exact! 😂

tidoust commented 3 years ago

My position is that microseconds are just as arbitrary as float milliseconds. If we could have rational seconds (timestamp + timebase), that would be superior for media, but I didn't find a way to do that that was ergonomic enough to justify.

The need to use rational numbers keeps being raised in media discussions. Past discussions include:

Conclusion seems to always be: that is more complicated than it seems, JS does not have a rational type, API would be less straightforward as a result, plus the ship has sailed... but we still need a solution with rational numbers "at some point".

WebCodecs is lower-level than other media APIs and has not shipped yet. Perhaps that's the right level and time to introduce to such a mechanism if we know that rounding errors are going to bite.

chrisn commented 3 years ago

Perhaps that's the right level and time to introduce to such a mechanism if we know that rounding errors are going to bite.

I did a quick search for previous discussions in TC39 and only found this, in related to a Decimal proposal: https://github.com/tc39/proposal-decimal/issues/6

padenot commented 3 years ago

This will be critical when building professional media applications on the Web, which is supposed to be possible. When you're looking at a frame, it's necessary to know exactly what frame it is without ambiguity, and be able to do frame accurate seeking and presentation, and build a video editing timeline.

Looking around (ffmpeg, apple APIs, windows APIs, gstreamer), it's all properly done. Now is the right time to do this properly, or we'll be doing a new API that is unfit for its stated use-case.

sandersdan commented 3 years ago

This is hyperbolic, I think we need to consider the actual practical implications when evaluating solutions.

When you're looking at a frame, it's necessary to know exactly what frame it is without ambiguity

This seems to imply that floats are ambiguous. They are not.

Looking around (ffmpeg, apple APIs, windows APIs, gstreamer), it's all properly done.

Android's MediaCodec uses integer microseconds, which is arbitrary.

Microsoft's MediaFoundation uses integer tenths-of-microseconds, which is arbitrary. The underlying D3D11VideoContext doesn't deal with timestamps at all.

FFmpeg and Apple's CMTime use rational time, as you say.

Now is the right time to do this properly, or we'll be doing a new API that is unfit for its stated use-case.

Existing APIs differ, yet all are used in professional media applications. The solutions to these problems are well-known.

I'd like to build something as nice as is possible, but we're not going to be unfit-for-purpose with any of the proposed solutions.

sandersdan commented 3 years ago

Just clarifying my position:

From an implementation point-of-view, integer microseconds is by far the easiest to implement in Chrome, as that's what our internal media timestamps already are. Integer microseconds appear to be well-supported by all platform APIs we've looked at, and so are likely easy for other implementations to use also.

padenot commented 3 years ago

Microsoft's MediaFoundation uses integer tenths-of-microseconds

It depends on what we're talking about. It uses rational for frame rates, and hns on frames themselves, with an entire page explaining how to properly round during computations when implementing an MFT.

Here, we want to have a rate in video decoder config (I just noticed there is no rate on VideoDecoderConfig !?) that is rational, and then have timestamps that are in integers microseconds OR rational. I prefer rational, because it's simpler to work with in the case of media that have a constant frame rate, but as you say integer microseconds or hns work, and it's more ergonomic when the media is (possibly) variable frame rate (where I assume we'd put a denominator that is 1). Milliseconds is too coarse, and floating point will be rounded incorrectly.

I think my comment was essentially lumping together the (related) issue of expressing the frame rate of a video stream (which is a problem in itself) and the timestamp on the frames, and those are two different issues, which I'm going to split off now, apologies.

chcunningham commented 3 years ago

Conclusions from editors call.

chcunningham commented 3 years ago

change to int64 to allow passthrough of negative timestamps (fairly common in various container formats)

leaving issue open for now to track this change.

chcunningham commented 3 years ago

I'm updating the TAG review to note the resolution here (int64 microseconds). To ease their review, here's a quick summary of our path to that outcome:

annevk commented 3 years ago

I would recommend giving feedback on https://github.com/tc39/proposal-decimal/issues/6 if rational (and not decimal) would be useful, even if only in the future. If we're adding another number type at some point, let's ensure it works for media.

chcunningham commented 3 years ago

Triage note: marked 'breaking', since changing the type can theoretically break. Having said that, Chrome has already updated the implementation and probably no one was broken anyway (ts values in range of unsigned by not signed would generally not expected).

I'll have a PR out for this shortly.