Open tguilbert-google opened 3 months ago
There's already a precedent for generating description
, iirc, with AAC
.
Generating headers seems reasonable, with a caveat:
I'd support to generating headers for channel mapping families 0
and 1
(defined here), when there are 8 channels or fewer and we can make a good guess at the mapping. For more than 8 channels, the channel mapping family is 255
and the mappings are entirely arbitrary, and asking for a description might be better than guessing.
Is there a way to support Opus channel mapping families 2
and 3
, or would that always be done via channel mapping family 255
?
Also, I am wondering if there are implications for other codecs where multichannel is supported via IAMF.
Thanks Bernard, I am just learning about channel mappings 2
and 3
. Do we currently support these families when we supply the description/extradata
? I don't have test media to verify this with.
I've ran into the following issue trying to generate a description
for channel mapping family of 1
: at the time audioDecoder.configure()
is called, we don't know whether the encoded streams will be "streams" or "coupled streams". E.g. the header for a 5.1 channels audio file could have anywhere in-between 6 mono streams and 3 stereo streams.
Does it still make sense to generate a header in this case?
I think you're right, I'm not sure what we're doing then.
Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1915143 for the Gecko side.
have isConfigSupported() return
{supported: false}
whenAudioDecoderConfig.description
isn't set
Just a note that this isn't supposed to return false
: https://w3c.github.io/webcodecs/#valid-audiodecoderconfig says to only check if it's detached, and we know that to some degree we want to check the channel count and sample-rate.
The idea here was to allow checking if a codec is supported, with some parameters, but not to require having bits of the media to check for support. e.g. if I want to know if a UA implement Flac decoding with Web Codecs (Flac being a codec that requires a description
), and in particular e.g. 192khz / 7.1, I shouldn't have to download bits of a Flac asset (= the description) to know that, and if it's not supported, I can immediately fall back to something else without having to hit the network. In case of invalid or missing description, codec instantiation will fail as expected.
In light of https://github.com/w3c/webcodecs/issues/826#issuecomment-2311367698 (that we obviously ran into as well), Gecko is going to implement requiring a description for channels > 2, because as @tguilbert-google that's the only thing possible. Some WPTs are added to check this, what Gecko was doing was nonsensical.
Those are good clarifications. I think Chrome's implementation needs to revisit some of its error messages, which suggest calling isConfigSupported()
when configure()
fail, which is confusing for web authors.
What about the following case:
description
: isConfigSupported()
returns true.description
is provided and clearly invalid: isConfigSupported()
return true?The first case is justifiable to me, as we want to allow quick and easy checks and early fallbacks.
In the second case, shouldn't we return false instead, as we know that the config isn't supported? Or does support refer more to the "form" of the config than its "content"? We could update the spec to call out that we might check channel counts and sample-rates and other top level items during isConfigSupported()
, but bad descriptions
fail at configure()
time.
Or does support refer more to the "form" of the config than its "content"? We could update the spec to call out that we might check channel counts and sample-rates and other top level items during
isConfigSupported()
, but baddescriptions
fail atconfigure()
time.
I've always thought of it like what you describe. The line is a bit blurry, but generally if you need to instantiate a byte stream parser, you've crossed it.
FWIW, Chrome is already parsing description
for H.264 and H.265, and will report unsupported if that fails.
There's a middle ground in which UA are allowed to perform a more in-depth check. I think this is reasonable, and useful for authors.
From the spec:
If a description has been set, the bitstream is assumed to be in ogg format. If a description has not been set, the bitstream is assumed to be in opus format.
What is one supposed to do if they have a opus
bitstream with > 2 channels?
Wondering if the spec had this in mind, but also what the current implementations do.
It doesn't exist. Opus is mono or stereo, and then you combine mono or stereo streams per https://datatracker.ietf.org/doc/html/rfc7845#section-5.1 and/or https://datatracker.ietf.org/doc/rfc8486/ to have a stream with more than 2 channels.
Thanks for the explanation!
I understand now that the audio tracks tagged A_OPUS
in WebM files are actually in OGG bitstream format (recognizable because of the presence of OpusHead
). So this makes sense now to me!
I'm surfacing this issue, originally reported against Chromium's implementation.
The following fails on Chromium but works on Firefox
If the channel count is greater than 2 we can either:
{supported: false}
whenAudioDecoderConfig.description
isn't setextradata
behind the scenesIs this worth calling out in the spec, or adding a non-normative note?
@padenot, wdyt?