w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
937 stars 131 forks source link

Clarify VideoEncoderConfig.contentHint #759

Closed tguilbert-google closed 4 months ago

tguilbert-google commented 5 months ago

Fixes #735, addresses comments from #758.

This PR removes the note from VideoEncoderConfig.contentHint and forces User Agent to respect other explicitly set encoder options.

It also adds an equivalent AudioEncoderConfig.contentHint section.


Preview | Diff

aboba commented 5 months ago

The current Opus configuration options are missing configuration parameters such as signal (with settings "auto", "music" and "voice") and application (with settings "voip", "audio" and "lowdelay"). IMHO, it might make more sense to add these to the Opus registry, or at least to specify how these settings map to the Audio Content Hint values of "", "speech", "speech-recognition", and "music". For example, you could indicate that "" corresponds to the signal value of "auto", "speech" corresponds to "voice" and "music" corresponds to "music", with "speech-recognition" having no effect. The "speech" content hint might map to the application value of "voip", the "music" hint could map to "audio" and "speech-recognition" could map to "lowdelay".

tguilbert-google commented 5 months ago

That's a good point. Those configuration parameters are defined in a spec, and can be referred to from the codec registry, unlike the implementation specific headers (e.g. OPUS_APPLICATION_VOIP). I should have attempted to use these in #758.

Under the current proposed definition of AudioEncoderConfig.contentHint, adding these could be done in parallel, without conflict: this PR's definition of contentHint specifies to respect any explicitly set flags, and then make a best effort to configure additional encoder flags; the application/signal flags in the Opus registry would take precedence over contentHint.

Would we want to keep AudioEncoderConfig.contentHint for symmetry with the video half of things, or is it useful for other audio codecs with fewer configuration parameters? Or would contentHint be a "quick and easy" top level alias for settings that are prescriptively mapped, and no longer just a "best effort"?

An explicit mapping for Opus seems easy with the proposed flags, but that might not be as easy for other audio/video codecs... Would defining an explicit mapping for Opus force all other codecs into defining mappings? I think this is something @Djuffin wasn't too keen on doing (but that was maybe only for implementation specific mappings OPUS_APPLICATION_VOIP).

aboba commented 5 months ago

For alaw, ulaw and pcm, it's not obvious to me that there are codec-specific settings that can be used to implement audio content hints like "speech", "music" or "speech-recognition". So I'm not imagining any changes to these codec registrations.

Does anything come to mind for flac or aac?

padenot commented 5 months ago

Does anything come to mind for flac or aac?

Not that I know of.

tguilbert-google commented 4 months ago

Following discussions with @Djuffin, which summarized @aboba's point of view on this PR, I updated the PR to only modify VideoEncoderConfig.contentHint.

In essence, a contentHint can be applied to multiple different codecs for video, but only Opus seems to benefit from a contentHint on the audio side.

Instead of adding contentHint to audio, I will send a follow-up PR which adds the application/signal flags directly to OpusEncoderConfig.