w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
975 stars 137 forks source link

Should we "un-round-up" OpusEncoderConfig.frameDuration to 2.5ms for a ptime of 3? #600

Closed tguilbert-google closed 1 year ago

tguilbert-google commented 1 year ago

OpusEncoderConfig.frameDuration is defined as a valid ptime, according to Section 6.1 of [RFC7587]:

"Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary multiple of an Opus frame size rounded up to the next full integer value [...]"

Supported Opus frame sizes in ms are {2.5, 5, 10, 20, 40, 60}. The 2.5 ms Opus frame size gets rounded up to a ptime of 3.

If we configure an AudioEncoder for Opus with a frameDuration of 3, is the underlying implementation supposed to un-round-up the 3ms value down to 2.5ms? Same goes for a frameDuration/ptime of 8, un-rounded-up to 7.5ms.

The confusion arises if 4 EncodedVideoChunks with a frameDuration of 3 are emitted: once decoded, should they total 10ms or 12ms? This should be clarified before browsers ship implementations.

A solution could be to change OpusEncoderConfig.frameDuration to use microseconds and only accept valid Opus frame sizes, and add an OpusEncoderConfig.ptime which controls the repacketizer under the hood.

If it understood across the industry that a ptime of 3 maps to 2.5ms (and 8 to 7.5ms, and so on), there should probably be an implementer's note in the spec.

tguilbert-google commented 1 year ago

@aboba, @bdrtc, @padenot, @dalecurtis

dalecurtis commented 1 year ago

Are 2.5ms and 3ms both valid sizes? If so we need to disambiguate somehow; I'd prefer breaking the relationship with ptime if that's the case -- i.e., that if a user wants to specify a ptime value they must manually convert.

@youennf too

bdrtc commented 1 year ago

the frameDuration should be the valid size opus encoder support, ptime defined in RFC 7587 should be the round-up of valid size opus support, so, the 2.5 ms is valid frameDuration, i agree with @dalecurtis about breaking the relationship with ptime of frameDuration, frameDuation standfor the valid size opus encoder support and its the un-round-up of ptime defined in RFC 7587.

aboba commented 1 year ago

Actually, the valid frame durations are defined in RFC 6716 Section 2.1.4:

"Section 2.1.4 Frame Duration

Opus can encode frames of 2.5, 5, 10, 20, 40, or 60 ms. It can also combine multiple frames into packets of up to 120 ms. For real-time applications, sending fewer packets per second reduces the bitrate, since it reduces the overhead from IP, UDP, and RTP headers. However, it increases latency and sensitivity to packet losses, as losing one packet constitutes a loss of a bigger chunk of audio. Increasing the frame duration also slightly improves coding efficiency, but the gain becomes small for frame sizes above 20 ms. For this reason, 20 ms frames are a good choice for most applications."

dalecurtis commented 1 year ago

@aboba and I lean towards breaking the ptime relationship and making frameDuration consistent with other timing fields in WebCodecs (i.e., expressed in microseconds). We would not unround 3ms then, but reject it as invalid since it's not one of the ones @aboba lists in RFC 6716 2.14

tguilbert-google commented 1 year ago

Updating the definition and using microseconds makes a lot of sense. I will upload a PR.