w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
1.01k stars 137 forks source link

Require `AudioDecoderConfig.description` for channels > 2 with Opus? #826

Open tguilbert-google opened 3 months ago

tguilbert-google commented 3 months ago

I'm surfacing this issue, originally reported against Chromium's implementation.

The following fails on Chromium but works on Firefox

const config = {codec: 'opus', numberOfChannels: 6, sampleRate: 44100};
console.log(await AudioDecoder.isConfigSupported(config)); // {supported: true}

const decoder = new AudioDecoder({ error: console.error, output: console.log });

decoder.configure(config);

If the channel count is greater than 2 we can either:

Is this worth calling out in the spec, or adding a non-normative note?

@padenot, wdyt?

padenot commented 3 months ago

There's already a precedent for generating description, iirc, with AAC.

tguilbert-google commented 3 months ago

Generating headers seems reasonable, with a caveat:

I'd support to generating headers for channel mapping families 0 and 1 (defined here), when there are 8 channels or fewer and we can make a good guess at the mapping. For more than 8 channels, the channel mapping family is 255 and the mappings are entirely arbitrary, and asking for a description might be better than guessing.

aboba commented 2 months ago

Is there a way to support Opus channel mapping families 2 and 3, or would that always be done via channel mapping family 255?

Also, I am wondering if there are implications for other codecs where multichannel is supported via IAMF.

tguilbert-google commented 2 months ago

Thanks Bernard, I am just learning about channel mappings 2 and 3. Do we currently support these families when we supply the description/extradata? I don't have test media to verify this with.

I've ran into the following issue trying to generate a description for channel mapping family of 1: at the time audioDecoder.configure() is called, we don't know whether the encoded streams will be "streams" or "coupled streams". E.g. the header for a 5.1 channels audio file could have anywhere in-between 6 mono streams and 3 stereo streams.

Does it still make sense to generate a header in this case?

padenot commented 2 months ago

I think you're right, I'm not sure what we're doing then.

padenot commented 2 months ago

Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1915143 for the Gecko side.

padenot commented 2 months ago

have isConfigSupported() return {supported: false} when AudioDecoderConfig.description isn't set

Just a note that this isn't supposed to return false: https://w3c.github.io/webcodecs/#valid-audiodecoderconfig says to only check if it's detached, and we know that to some degree we want to check the channel count and sample-rate.

The idea here was to allow checking if a codec is supported, with some parameters, but not to require having bits of the media to check for support. e.g. if I want to know if a UA implement Flac decoding with Web Codecs (Flac being a codec that requires a description), and in particular e.g. 192khz / 7.1, I shouldn't have to download bits of a Flac asset (= the description) to know that, and if it's not supported, I can immediately fall back to something else without having to hit the network. In case of invalid or missing description, codec instantiation will fail as expected.

In light of https://github.com/w3c/webcodecs/issues/826#issuecomment-2311367698 (that we obviously ran into as well), Gecko is going to implement requiring a description for channels > 2, because as @tguilbert-google that's the only thing possible. Some WPTs are added to check this, what Gecko was doing was nonsensical.

tguilbert-google commented 2 months ago

Those are good clarifications. I think Chrome's implementation needs to revisit some of its error messages, which suggest calling isConfigSupported() when configure() fail, which is confusing for web authors.

What about the following case:

The first case is justifiable to me, as we want to allow quick and easy checks and early fallbacks. In the second case, shouldn't we return false instead, as we know that the config isn't supported? Or does support refer more to the "form" of the config than its "content"? We could update the spec to call out that we might check channel counts and sample-rates and other top level items during isConfigSupported(), but bad descriptions fail at configure() time.

padenot commented 2 months ago

Or does support refer more to the "form" of the config than its "content"? We could update the spec to call out that we might check channel counts and sample-rates and other top level items during isConfigSupported(), but bad descriptions fail at configure() time.

I've always thought of it like what you describe. The line is a bit blurry, but generally if you need to instantiate a byte stream parser, you've crossed it.

sandersdan commented 2 months ago

FWIW, Chrome is already parsing description for H.264 and H.265, and will report unsupported if that fails.

padenot commented 2 months ago

There's a middle ground in which UA are allowed to perform a more in-depth check. I think this is reasonable, and useful for authors.

JonnyBurger commented 2 months ago

From the spec:

If a description has been set, the bitstream is assumed to be in ogg format. If a description has not been set, the bitstream is assumed to be in opus format.

What is one supposed to do if they have a opus bitstream with > 2 channels? Wondering if the spec had this in mind, but also what the current implementations do.

padenot commented 2 months ago

It doesn't exist. Opus is mono or stereo, and then you combine mono or stereo streams per https://datatracker.ietf.org/doc/html/rfc7845#section-5.1 and/or https://datatracker.ietf.org/doc/rfc8486/ to have a stream with more than 2 channels.

JonnyBurger commented 2 months ago

Thanks for the explanation!

I understand now that the audio tracks tagged A_OPUS in WebM files are actually in OGG bitstream format (recognizable because of the presence of OpusHead). So this makes sense now to me!